There is no dearth of articles and blogs on the rising demands for big data professionals and data scientists. Big data does provide some of the hottest new jobs out there, but how does one get started on this path for establishing a career in big data? In this blog, we will clarify just what exactly encompasses a career in big data and how you can join the big data bandwagon even if you are not a data scientist or a data engineer.Let’s begin by understanding big data and its needs.
Big data implies data characterized by high volume, high velocity, high variety and high veracity; data that is in petabytes in size (1 petabyte = 1 million GB), streaming in at blazing fast speeds in a real-time fashion, from a wide variety of sources and formats (from social media, sensors embedded in devices, POS data, etc.) and laced with inaccuracies and incompleteness (you can’t trust everything on social media, phew!). But yet, this data is very valuable because it can reveal insights about your business that you never knew before. Using this data, companies are unlocking knowledge that aids in making better business decisions, optimizing operations and beating the competition.
Examples are aplenty! Uber, which owns no taxis but has disrupted the traditional taxi business; AirBnB, which owns no real estate but has become an imposing competitor to the hotel giants; Netflix, that initially produced no content, but became the most popular means of entertainment delivery. All of these three thrive on data. Data about who needs to get from point A to point B in a city and who is in the best position to help them make that journey. Data about who is looking for accommodation and who is willing to offer accommodation for rent. Data about what entertainment can be offered to person X to keep them engaged and loyal.
“Data is the new gold,” “data is the new oil,” “data is a currency” — you may have heard it all. But whether data can be compared to oil, wind or solar energy, one thing is clear: to get real value out of data, you need a multi-disciplinary team. Although data scientists and data engineers are at the top of the food chain in a big data team, there are a lot of other skills required by organizations to successfully build a big data strategy.
The different skills required can be aligned with the various layers of a big data infrastructure. For example, the physical infrastructure consisting of servers, network, clusters of nodes, storage, etc. forms the backbone of a big data implementation. This can make or break a big data project. Server administrators and network administrators are needed to manage capacity, planning and the smooth functioning of this infrastructure. If opting for a cloud model instead of an on-premise infrastructure, a cloud engineer is essential to understanding the concepts of cloud computing and the big data tools offered by cloud providers. Security experts are required to perform a range of functions securing data, including enforcing data governance and access constraints, providing encryption for sensitive data at rest and in motion, and protecting against cyber attacks and other sophisticated hacking attempts.
Administering data stores that hold big data is another critical role. Database administrators, data warehouse managers and data architects fill in this much-needed gap. These folks are database experts who define how the data of the company is stored. Organizing the storage systems, data warehouses, databases and data lakes is their responsibility. They are responsible for designing and developing logical and physical layers for various databases and recommending solutions for database security, testing, data back-up and recovery. Responsible for integrating big data tools like NoSQL and Hadoop with data warehouses and RDBMS, skills required include knowledge of RDBMS concepts, data warehouse architectures, ETL development, business intelligence tools, Hadoop and NoSQL.
Big data administrators play an extremely important role and are critical for any big data team. These are the people responsible for building and managing a big data cluster and platform that may include Hadoop, Hive, HBase, HDFS, YARN, Pig, Storm, Kafka, Cassandra, MongoDB or other NoSQL stores. Responsibilities include capacity planning for hardware and software, installing big data platforms, monitoring system performance, managing cluster connectivity and security, applying patches and upgrades, the automation of deployment, customization, and monitoring through DevOps tools. This role demands collaboration with the infrastructure, network, security, database, application and business intelligence teams. Although this is a highly specialized role, it is also an opportunity for Linux administrators to transition into big data with appropriate training and exposure.
A Hadoop developer, big data engineer or big data developer are the data plumbers. Rising in demand, these roles are essential to building data pipelines and frameworks that ultimately provide data to analyze for the data scientists. This job includes ingesting data from a variety of sources and processing data in both batch and real-time manners. These people usually come from a development background with programming experience in languages like Python, Java, Scala, Ruby, Perl or Bash scripting. Development expertise in the Hadoop ecosystem includes Hadoop, Hive, Pig, Kafka, Storm and Spark.
Data scientists are highly-specialized unicorns that have a rich mix of expertise in statistics, machine learning, data mining, operations research, mathematics and computer programming. They are skilled at diving into messy data, finding insights and presenting them to people in a way that can be easily understood. This role includes mining petabytes of data, building algorithms, running live prototypes to demonstrate the operational efficiencies using those algorithms and then writing robust code to automate the algorithms and prove their performance on large-scale live data. In addition to familiarity with the Hadoop ecosystem, this role also demands experience in statistical modeling, deep learning, neural networks, natural language processing, analytical tools like R, SAS, Matlab and programming languages like Java, Scala, Python or C++.
Data analysts are an extension to business analysts and are required to bridge the gap between business and engineering. These folks are responsible for understanding the business requirements, identifying sources of data and elements of data required for processing, modeling the data and visualizing requirements and reporting formats.
Some organizations define the role of a data analyst in a different way, more as a junior data scientist. This role requires experience in interpreting both structured and unstructured data and they are expected to research and develop metrics like Key Performance Indicators using statistical analysis methods. Familiarity with Hadoop, Hive, Spark and Python is expected for this role to be able to take advantage of large-scale data processing framework and APIs.
Visualization developers include proficiencies in visualization tools like Tableau and Qlik. Serves as a technical resource for accessing disparate sources of data and integrating these sources of information into a common and interactive platform through Tableau, etc. Visualization developers perform analysis to build data sources leveraging existing data marts and warehouses, design dashboard visualizations tailored to customer needs and execute performance testing to deploy dashboards to large audiences. They are also responsible for producing and distributing weekly, monthly and quarterly key performance management reporting.
Project managers are required for charting schedules, managing budgets and enabling communication between the different players in a big data project. Development managers are needed to mentor data engineers and data scientists and establish deliverables and manage communication with senior executives so the work ultimately aligns with the business’s strategy. A mix of management experience and familiarity with big data techniques and tools is mandatory for this role.
The graph above, obtained from indeed.com, confirms the growing demand for big data analytics jobs, which is only expected to gain momentum in the coming years. Although the demand is going up steadily, there is a huge deficit on the supply side. This presents an attractive opportunity for IT professionals who can invest in education and self-develop key skills required in the big data area to ultimately give a boost to their careers and businesses.
If you have further questions, please post a comment and we will get back to you!