With added emphasis on the use of data by many companies, from supermarkets to multinational corporations, new roles are developing for data scientists to analyse and investigate what lies behind the figures and what trends can be shown through the collection of data.
While the role may seem relatively straightforward, it can be exceptionally complex and there are key data science skills that are required to excel in the role. While every company will have a different value for particular skills and tools, we take a look at what makes a bad data scientist and what skills are essential to the role.
Not a team player
Data scientists must analyse information and look at what lies beneath the surface. Often, this can involve looking at large quantities of information and working as a unit. If a data scientist cannot function in a team or wants all of the glory, then they are not going to work well with others and produce the best results.
Poor mathematical background
Mathematics is one of the key tools in analysing data. Therefore, it is important that a data scientist has a strong mathematical knowledge and can learn algorithms and other key tools quickly. Having a passion for maths will lead to a higher quality of work.
Poor computing knowledge
To succeed as a data scientist, it is important to have strong computer skills to calculate and present information. Not everything is analysed or presented on paper; thus, a strong digital background is key. If a data scientist doesn’t have knowledge of some of the key platforms, such as Spark, then chances are, they’re a bad one.
Poor communication skills
A data scientist has to bring clarity and insight to data and regardless of whether they can memorise algorithms or key formulas if they cannot communicate their findings or their ideas, then they will not succeed as a data scientist. A data scientist must be approachable and aid the performance of an organisation with good communication.
No business knowledge
A data scientist must have a knowledge of the world of business and know what problems your business has and the problems your company is trying to solve. If they fail to understand business issues then how can they solve the problem?
Lack of knowledge about tools
When it comes to data science, there is an arsenal of tools that can be used to collate, analyse and present information. From Scala and Python to SAS and Matlab. A data scientist must have knowledge of most of these tools. If not, they are not a great fit for your business.
SAS only knowledge
Similar to the point above, some “data scientists” have a knowledge of coding and thus have rebranded themselves to be a data scientist. However, if they only know about code, this does not mean that they know how to read or analyse data.
Don’t want to get their hands dirty
If a data scientist is unwilling to take risks, analyse data and dig into the code, then they will simply not fit into any organisation. Being a data scientist takes risk and a hard-working ethos.
A know it all
Nothing is ever the answer when analysing data until the data proves or matches the relevant theory. If a data scientist is convinced that they have the right answers all the time, then they will never be able to see out of their own prism, thus, they will never be able to adequately review figures.
Lacking a natural sense of curiosity
Most data scientists need to find the answers and wish to find out the trends and data behind the figures if a data scientist is not curious or is unmotivated to find out what makes things tick, this is exceptionally bad practice.
Bio: Seamus Breslin is the Founder and Managing Director of Solas Consulting, and has over 11 years experience in the IT sector. Solas specialises in placing Data, BI , SQL , Oracle , Java and .Net professionals.
Piotr Migdał, deepsense.io.http://www.kdnuggets.com/
In this post I try to summarize my advice. I don’t intend to write a complete walkthrough, but to provide a starting point, with links to further materials. I target it at people with academic, quantitative background (e.g. physics, mathematics, statistics), regardless if they are undergraduate students, PhDs or after a few postdocs. Some points may be valid for other backgrounds1 (but then - use it at your own risk).
Here and everywhere else: please don’t take approach of learn book[s] then play - start with playing!
In short:
I had a strong background in physics and interest in complex system; I did a lot of academic programming and none of - practical.
After the 1st year of my PhD studies I started learning Python (for web scraping and plotting) on my own time.
9 months later I participated in a 1-month data science school (Big Dive in Turin).
8 months later I went to a summer internship in data science in San Francisco (for 4 months).
I started part-time freelancing (as I was finishing my PhD).
After finishing PhD I made it my main activity.
All projects required me to learn something new - be it a library, a machine learning model or a software tool.
Analyzing real, and often - dirty, data using a mixture of programming and statistics. Or, as Josh Wills put it:
Data scientist is a person who is better at statistics than any programmer and better at programming than any statistician.
From my perspective the whole process looks that way:
ask question that is relevant to the project
get data (CSV, SQL, plain text)
process it (joining, cleaning, supplementing it)
run analysis (statistical tests or machine learning)
interpret and use results (being able to understand the above)
present results (a report, plot, interactive data visualization)
And everything needs to be done in a reproducible way - so others can interact with your code, or even run it on a server. Depending on the job, there may be more emphasis on one part or the other. Or even look at this tweet - while humorous2, it shows a balanced list of typical skills and activities of a data scientist:
If you want to learn more about what is data science, look at the following links:
The State of Data Science - RJMetrics Report - especially plots top 20 skills of a data scientist andtop 20 backgrounds of data scientists; corollary: you don’t have to know everything
Doing Data Science at Twitter - Robert Chang
Advice and insights from 25 amazing data scientists - interviews with practitioners holding various positions, having various backgrounds
When you have some academic title, no-one will question your intelligence. But they are justified to question your practical skills. From my experience, you need to fulfill two requirements:
have minimal skills so that you are useful starting from day 1 (e.g. you can get data and present summary statistics; they don’t want to start with teaching you Python and Git),
be able and eager to learn (in general, their technologies, be self-driven to discover and solve new problems even without being explicitly guided).
Most data science things are simple and at the point that you are able to use R or Python you can start working, gradually increasing your knowledge and experience. That is, after a few months you should be ready to start an entry-level job.
Initially, I was afraid that it is a problem that I lack 10+ years of experience with C++ and Java. So how could I compete with serious software engineers, who did their computer science major? But it turned out that most of my commercial projects are for IT companies - they have wonderful programmers but often no-one proficient at dealing with real data. So (from Academia to Industry linked below):
While having a strong coding ability is important, data science isn’t all about software engineering (in fact, have a good familiarity with Python and you’re good to go). Data scientists live at the intersection of coding, statistics, and critical thinking.
See also:
How to leave academia - Chris Stucchio - getting practical skills, interviews, salary negotiations
Academia to Industry: Data Science Myths and Truths - Emily Thompson - so: no, you don’t have to do adverts or finance
In academia, you are allowed to cherry-pick an artificial problem and work on it for 2 years. The result needs to be novel, and you need to research previous and similar solutions. The solution needs to be perfect, even if not on time.
In industry, you should solve a given problem end-to-end. Things need to work, and there is little difference if it is based on an academic paper, usage of an existing library, your own code or an impromptu hack. The solution needs to be on time, even if just good enough and based on shady and poorly understood assumptions.
So, contrary to its name, it’s rarely science3. That is, in data science the emphasis is on practical results (like in engineering) - not proofs, mathematical purity or rigor characteristic to academic science.
In the software industry resume plays a different role than CV in academia. Rather than being a complete record or all positions, awards and publication, it is a short (typically 1 page) summary of the main skills and the most important positions/accomplishments. It is used to screen candidates, not as the final judgement. To see the difference, compare and contrast my data science resume with my academic CV.
Applying for a job involves being asked technical questions - on the phone or Skype. For software engineering it involves both conceptual questions and whiteboard coding; for data science it may vary. In any case, take a look at:
If you need learn basic algorithms and data structures, I recommend:
Algorithms at Coursera by Wayne and Sedgewick if you like MOOCs
T. Cormen, C. Leiserson, R. Rivest and C. Stein, Introduction to Algorithms - a classical, in-depth book
If you get no technical questions, it may be a red flag. If you get only software engineering questions, it may be a sign that they want to hire a programmer, not - a data scientist (no matter what their job calling says); and given you background you want to be a Type A Data scientist (i.e. more a statistician than a regular programmer), according to this taxonomy.