DS job

With added emphasis on the use of data by many companies, from supermarkets to multinational corporations, new roles are developing for data scientists to analyse and investigate what lies behind the figures and what trends can be shown through the collection of data.

While the role may seem relatively straightforward, it can be exceptionally complex and there are key data science skills that are required to excel in the role. While every company will have a different value for particular skills and tools, we take a look at what makes a bad data scientist and what skills are essential to the role.

Not a team player

Data scientists must analyse information and look at what lies beneath the surface. Often, this can involve looking at large quantities of information and working as a unit. If a data scientist cannot function in a team or wants all of the glory, then they are not going to work well with others and produce the best results.

Poor mathematical background 

Mathematics  is one of the key tools in analysing data. Therefore, it is important that a data scientist has a strong mathematical knowledge and can learn algorithms and other key tools quickly. Having a passion for maths will lead to a higher quality of work.

Poor computing knowledge

To succeed as a data scientist, it is important to have strong computer skills to calculate and present information. Not everything is analysed or presented on paper; thus, a strong digital background is key. If a data scientist doesn’t have knowledge of some of the key platforms, such as Spark, then chances are, they’re a bad one.

Poor communication skills

A data scientist has to bring clarity and insight to data and regardless of whether they can memorise algorithms or key formulas if they cannot communicate their findings or their ideas, then they will not succeed as a data scientist. A data scientist must be approachable and aid the performance of an organisation with good communication.

No business knowledge

A data scientist must have a knowledge of the world of business and know what problems your business has and the problems your company is trying to solve. If they fail to understand business issues then how can they solve the problem?

Lack of knowledge about tools

When it comes to data science, there is an arsenal of tools that can be used to collate, analyse and present information. From Scala and Python to SAS and Matlab. A data scientist must have knowledge of most of these tools. If not, they are not a great fit for your business.

SAS only knowledge

Similar to the point above, some “data scientists” have a knowledge of coding and thus have rebranded themselves to be a data scientist. However, if they only know about code, this does not mean that they know how to read or analyse data.

Don’t want to get their hands dirty

If a data scientist is unwilling to take risks, analyse data and dig into the code, then they will simply not fit into any organisation. Being a data scientist takes risk and a hard-working ethos.

A know it all

Nothing is ever the answer when analysing data until the data proves or matches the relevant theory. If a data scientist is convinced that they have the right answers all the time, then they will never be able to see out of their own prism, thus, they will never be able to adequately review figures.

Lacking a natural sense of curiosity 

Most data scientists need to find the answers and wish to find out the trends and data behind the figures if a data scientist is not curious or is unmotivated to find out what makes things tick, this is exceptionally bad practice.

Bio: Seamus Breslin  is the Founder and Managing Director of Solas Consulting, and has over 11 years experience in the IT sector. Solas specialises in placing Data, BI , SQL , Oracle , Java and .Net professionals.

Piotr Migdał, deepsense.io.http://www.kdnuggets.com/

In this post I try to summarize my advice. I don’t intend to write a complete walkthrough, but to provide a starting point, with links to further materials. I target it at people with academic, quantitative background (e.g. physics, mathematics, statistics), regardless if they are undergraduate students, PhDs or after a few postdocs. Some points may be valid for other backgrounds1 (but then - use it at your own risk).

Here and everywhere else: please don’t take approach of learn book[s] then play - start with playing!

My story

In short:

All projects required me to learn something new - be it a library, a machine learning model or a software tool.

What is data science?

Analyzing real, and often - dirty, data using a mixture of programming and statistics. Or, as Josh Wills put it:

Data scientist is a person who is better at statistics than any programmer and better at programming than any statistician.

From my perspective the whole process looks that way:

And everything needs to be done in a reproducible way - so others can interact with your code, or even run it on a server. Depending on the job, there may be more emphasis on one part or the other. Or even look at this tweet - while humorous2, it shows a balanced list of typical skills and activities of a data scientist:

If you want to learn more about what is data science, look at the following links:

On the transition

When you have some academic title, no-one will question your intelligence. But they are justified to question your practical skills. From my experience, you need to fulfill two requirements:

Most data science things are simple and at the point that you are able to use R or Python you can start working, gradually increasing your knowledge and experience. That is, after a few months you should be ready to start an entry-level job.

Initially, I was afraid that it is a problem that I lack 10+ years of experience with C++ and Java. So how could I compete with serious software engineers, who did their computer science major? But it turned out that most of my commercial projects are for IT companies - they have wonderful programmers but often no-one proficient at dealing with real data. So (from Academia to Industry linked below):

While having a strong coding ability is important, data science isn’t all about software engineering (in fact, have a good familiarity with Python and you’re good to go). Data scientists live at the intersection of coding, statistics, and critical thinking.

See also:

Priorities

In academia, you are allowed to cherry-pick an artificial problem and work on it for 2 years. The result needs to be novel, and you need to research previous and similar solutions. The solution needs to be perfect, even if not on time.

In industry, you should solve a given problem end-to-end. Things need to work, and there is little difference if it is based on an academic paper, usage of an existing library, your own code or an impromptu hack. The solution needs to be on time, even if just good enough and based on shady and poorly understood assumptions.

So, contrary to its name, it’s rarely science3. That is, in data science the emphasis is on practical results (like in engineering) - not proofs, mathematical purity or rigor characteristic to academic science.

Resume vs academic CV

In the software industry resume plays a different role than CV in academia. Rather than being a complete record or all positions, awards and publication, it is a short (typically 1 page) summary of the main skills and the most important positions/accomplishments. It is used to screen candidates, not as the final judgement. To see the difference, compare and contrast my data science resume with my academic CV.

Interviews

Applying for a job involves being asked technical questions - on the phone or Skype. For software engineering it involves both conceptual questions and whiteboard coding; for data science it may vary. In any case, take a look at:

If you need learn basic algorithms and data structures, I recommend:

If you get no technical questions, it may be a red flag. If you get only software engineering questions, it may be a sign that they want to hire a programmer, not - a data scientist (no matter what their job calling says); and given you background you want to be a Type A Data scientist (i.e. more a statistician than a regular programmer), according to this taxonomy.