Table of contents
With the current trends of technology, I am certain you have ever heard of the terms Machine Learning and Data Science and wondered what the h*ll is this thing. how is it all done? Does it require programming? And more questions. Yeah, you are reading the writing article, all your answers are just a scroll away to the next heading.
In this article, you will get to know what Machine Learning and Data Science are, and then go further and have a comprehensive introduction to the basic programming language syntax needed to join this field of Science and Technology.
Pre-requisites
Prior programming knowledge
Jupyter Notebook IDE
Python 3.6 and above
Let’s dive right into your first steps into learning programming for Machine Learning and Data Science
Machine Learning and Data Science, What are they?
These two terms are related but are different fields. Data Science brings structure to big data and Machine Learning focuses on learning from the data itself.
A detailed differentiation between the 2 terms can be found here: https://ibm.com/blog/data-science-vs-machine-learning-whats-the-difference/
According to Glassdoor, Machine Learning Engineers and Data Scientists in the US earn an average salary of 150K USD annually.
Below, we shall navigate through every syntax of Python, the most used language for programming in Machine Learning and Data Science libraries, step by step and then close up with where to go next after here.
Print and Print formatting
In Python, there is a function called print()
which prints the output on the screen. Let’s see this in the example below.
```Code
greeting = ‘Hello world’
print(greeting)
#Outputs Hello world on the screen
The important part to master here is the Print formatting: how to format the kind of output of a variable you want.
Given the variables below
age = 12
name= ‘John Doe’
The code for formatting your output is shown below here;
print(‘My name is { } and I am { } years old’.format(name, age)
in the above code, you will have to arrange the variables in .format()
in the order in which you want them to appear but to avoid that you can write as below:
print(‘My name is { one } and I am { two } years old’.format(two = age, one = name)
Lists, Tuples, Tuple unpacking, Sets, and List comprehension
In Python, we have 4 in-built data structures and these are Lists, tuples, dictionaries, and sets.
A Python list is an inbuilt data structure that holds a collection of items. To get to how it works, let have the example below;
We want to create a list of the five vowels. The elements should be closed in square brackets [ ] and then each separated by a comma from another as shown:
vowels = [‘a’, ‘e’, ‘I’, ‘o’, ‘u’]
When we want to add elements to the end of a list, we use the append() function as
vowels.append(‘a’).
print(vowels)
#the output is [‘a’, ‘e’, ‘I’, ‘o’, ‘u’, ‘a’]
This means that under lists, we can duplicate elements freely. Another important function here is the remove()
, it removes elements out of the list.
vowels.remove(‘e’)
#removes the element ‘e’ from the list of vowels.
Nesting lists
Lists can be integrated into one another as shown below
nest = [1, 2, [3, 4]]
print(nest[2]) #outputs [3, 4]
print(nest[2][1]) #outputs 4
Python tuples
A tuple is another in-built data structure that can hold a collection of elements. The difference between a tuple and a list is that tuple elements mainly correspond to one record as a list contains a collection of different items. A tuple is represented by curly brackets. For example:
#lists
classmates = [‘ajika’, ‘angelo’, ‘john’, ‘foe’]
While for a tuple it is better mostly used this way;
classmate = (‘ajika’, ‘12’, ‘single’)
The list has many classmates as the tuple has a single classmate with the element details like name, age, and relationship status.
Both in a tuple and a list, duplication of elements is allowed but the difference is that a tuple is immutable that is doesn’t support changing, adding, or removing the elements.
t[0] = 1
is not true.
Let's dive into what is called tuple unpacking:
Using this sample code, we elaborate on how tuple unpacking works
x = [(1,2), (3,4), (5,6)]
for an in x:
print(a)
#outputs the values 1, 3, 5
for a,b in x:
print(a, b)
#outputs (1,2), (3,4), (5,6)
This technique is useful in iterating through a list of tuples.
Sets
Just like lists and tuples, a set also holds many elements. It is represented with curly braces { } instead. To get the difference between the above 2 data structures, let have the example below;
vowels = {‘e’, ‘a’, ‘e’, ‘I’, ‘o’, ‘u’, ‘I’}
print(vowels)
#The output is = {‘e’, ‘a’, ‘I’, ‘o’, ‘u’}
From above, you notice that sets are unordered and do not support duplicates hence return only the unique elements.
Note that sets are mutable and can be modified, unlike tuples.
Dictionaries
A dictionary is a collection of elements through keys and actual values. It can be modified and does not allow duplicates.
Let's get to sample code to understand the syntax of this data structure;
d={ ‘key1’ : ‘value1’,
‘key2’ : ‘value2’
‘num’: 12 }
print(d[‘key1’]) #outputs value1
print(d[‘num’] #ouputs 12
A dictionary can also include a list as one of its values as below
d = { ‘k1’ : [1, 2, 3] }
d[‘k1’][1] #outputs 2
A dictionary can also be nested as below
d={ ‘k2’ : { ‘innerK’: [1,2,3]} }
d[‘k2’][‘innerK’][0] #outputs 1
Conditions and If, Elif and Else
Python conditions are conditional statements just like in any other language.
The logical conditions like:
#Equals
a == b
#Not equals
a != b
#Less than
a < b
#Less than or equal
a <= b
#Greater than
a > b
#Greater than or equal
a >= b
These states are commonly used especially in if statements
a=10
b=20
if b>a:
print(“b is greater than a”)
Notice the Python indentation format for block statements, always press the TAB key after starting a block. Other indetations will result in an error.
Elif
This is used in cases where if the previous condition is not true, then try this one or Else. Below is a sample applying if, elif,
and else
:
if b>a:
print(“b is greater than a”)
elif a== b:
print(“a is equal to b”)
else
print(“a is greater than b”)
Loops: for and while
The working of loops here does not differ from how they work in other programming languages as well just that the syntax is unique.
For the for loop, below is a sample code of how it looks like
seq = [1, 2, 3, 4, 5]
for num in seq:
Print(num)
#This outputs all the numbers in the list below as below
"""
1
2
3
4
5
"""
#For the while loop, below is a sample code
while i<5:
Print(‘i is: { }’.format(i))
i=i+1
#this outputs
"""
1
2
3
4
"""
Python functions
The syntax for functions in Python is shown below:
def my_func(param1):
print(param1)
#function call
my_func(“Hello world”)
#outputs Hello world after receiving it as an argument of the function my_func with the parameter param1.
Functions in Python support Doc Strings, they are also called multiple line comments. They help define what a given function does.
In our function above let’s include a docstring:
def my_func(param1):
`"""
THIS FUNCTION PRINTS YOUR INPUT ARGUMENT
"""
print(param1)
my_func(“Hello world”)
To check for the docstring, ensure the cursor is at the end of the functional call then press Shift + Tab
This is shown below
Let's talk about one of the commonly used in-built functions in Python.
Do you recall the example we gave for the for
loop above, can be simplified using the range(initial, final)
function as shown below;
print(range(1, 5))
# outputs 1 2 3 4
List comprehension
I hope you still recall the syntax for lists, because here is a simplification of for
loops using the list comprehension technique:
x = [1, 2, 3, 4]
out = [ ]
for num in x:
out.append(num**2)
print(out)
#ouputs [1, 4, 9, 16]
The above code is getting all the values in the list x
squaring each of them and then outputting a list of the squared values.
This can be simplified using list comprehension as shown below
out = [num**2 for num in x]
print(out)
#outputs [1, 4, 9, 16]
To master this syntax, have a look at it as the reverse of a for
loop in square brackets.
We are good to go.
Lambda expressions, Map, and Filter functions
A lambda function is a small anonymous function or shorthand function. It can take any number of arguments, but can only have one expression.
Its syntax is shown in the example below;
t = lambda var: var*2
print(t(6))
# outputs the value for the variable 6 multiplied by 2 which is 12.
The map()
function in Python is also a very important in-built function commonly applied in Machine learning and Data Science.
Let's explore its syntax;
def times2(val):
return val*2
seq = [1, 2, 3, 4, 5]
list(map(times2, seq ))
#This outputs a list of values [2, 4, 6, 8, 10]
Filter() function
This returns an iterator where the items are filtered through a function to test if the item is accepted or not.
Let's explore and understand this through this example:
seq = [1, 2, 3, 4, 5]
list(filter(lambda num: num%2 == 0, seq)
#outputs [2,4]
the above code has a shorthand function that checks for the modulus of all the elements of the seq
list with 2, if it is 0, then that is passed to a list.
Hence the output is [2, 4]
Other important methods or functions
Some of the important methods or functions that you need to know before getting into machine learning or data science are shown below:
Let's explore methods on the string data type:
s = “Hello, my name is Sam”
s.lower()
converts the whole string to lowercase
s.upper()
does the opposite of lower()
s.split()
is a very useful method in text analysis in Natural Language Processing and Large Language Models.
Let's take an example to understand it more:
tweet = "Go Sports! #Sports"
tweet.split() #returns a list of this form [‘Go’, ‘Sports!’, ‘#Sports’]
#Now when we are to say:
tweet.split("#")
#It splits the tweet into two, from where it finds the ‘#’ sign,
#for this case it looks like this:
#[‘Go Sports!’, ‘Sports’]
Thus,
tweet.split(‘#’)[1]
returns all the tweets with the hashtag #Sports
Let's explore the functions on dictionaries:
d = { ‘k1’: 1, ‘ k2’:2 }
d.keys()
returns keys in the dictionary
d.items()
returns dictionary items pairs
d.values()
returns the dictionary values.
Let's also explore functions on Lists:
Given a list = [1, 2, 3]
list. pop()
removes the last element, in this case, 3, or we can even give the index to pop e.g. list.pop(1)
, this removes 2 from the list.
‘2’ in [1, 2, 3]
returns true while ‘x’ in [1, 2, 3]
returns false.
Where to go from here
We have come this far with you Mr. / Mrs. programmer, Congratulations!
You might now be asking yourself how this article could be of use to you, yes, you now have the basic understanding of Python to enable you navigate the libraries for Machine Learning and Data Science.
These libraries include the following:
Pandas
NumPy
Seaborn
Matplotlib
SciKit Learn
These libraries will enable you to code Python very efficiently in Jupyter Notebook as you deal with various tasks like Data visualization, feature engineering, data analysis, data plotting, and coding most of the commonly used algorithms in the world of Machine Learning and Data Science.
I wish you good luck on your journey to becoming one of the greatest engineers in the fields of Data Science, Machine Learning, and Artificial Intelligence.
I do recommend some resources for you to explore for your future learning:
You can also reach me on my LinkedIn at Ajika Angelo where I will share with you many free courses for this field to help you in your learning journey.
I will be glad to see how far your journey will go.