Deep Learning is a subset of machine learning, a branch of computer science focused on mimicking the human brain. It involves using artificial neural networks to perform tasks under various conditions and environments, learning from experiences and given data. Deep Learning uses multiple layers to process data (information) and extract desired results or higher-level features from raw data.

The main architecture of Deep Learning is based on four key architectures, which are as follows:

1. Deep Neural Network (DNN)

The Deep Neural Network (DNN) is an Artificial Neural Network (ANN) that consists of multiple layers of filters to obtain outputs based on user inputs. DNN finds the correct and most probable output by performing mathematical manipulations across many layers between the inputs and outputs. DNN can model both linear and non-linear relationships.

The term “Deep Neural Network” refers to the many layers of neural networks used to interpret the probability threshold for specific outputs. The threshold probability of each layer and output can be managed by the user or data scientist, allowing the recognition results to be verified and cross-checked at each layer and node.

Today, it is easier to use deep neural networks with thousands or millions of layers containing billions of nodes, thanks to powerful GPUs and vast data mining capabilities.

2. Deep Belief Network (DBN)

A Deep Belief Network (DBN) is an alternative class of DNN, where hidden layers are present. The main difference between DNN and DBN is that DBN only has connections between layers, not with each individual node within the layers.

DBNs are trained layer by layer using a greedy algorithmic approach. A greedy algorithm solves problems by making the locally optimal choice at each stage, allowing the system to learn or discover information on its own. In this case, each layer learns and gives output one at a time.

3. Recurrent Neural Network (RNN)

Recurrent Neural Networks (RNNs) are a class of ANNs in which nodes have temporary states that change over time, similar to a feedforward neural network. The primary characteristic of RNNs is their ability to generate new sequences for new inputs and provide the most probable outputs to the user.

RNNs include Long Short-Term Memory (LSTM) networks, invented by Hochreiter and Schmidhuber. RNNs are widely used for applications such as Speech Recognition and Handwriting Recognition.

The term “recurrent” refers to the behavior of the algorithm, where RNNs have two broad classes: finite and infinite impulse. Both classes exhibit temporal dynamic behavior, with finite impulse RNNs directed toward acyclic graphs (which can be unrolled and replaced by a strictly feedforward network), and infinite impulse RNNs directed toward cyclic graphs (which cannot be unrolled).

Some subclasses of RNNs include:

  • Fully Recurrent Neural Networks
  • Elman Networks and Jordan Networks
  • Hopfield Networks
  • Echo State Networks
  • Independently Recurrent Neural Networks (IndRNNs)
  • Recursive Networks
  • Neural History Compressors
  • Second-Order RNNs
  • Long Short-Term Memory (LSTM)
  • Gated Recurrent Units (GRUs)
  • Continuous-Time RNNs
  • Recurrent Multilayer Perceptron Networks
  • Multiple Timescales Models
  • Bi-directional RNNs

4. Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a type of Deep Learning algorithm in which the network does not operate in a typical matrix-based manner. Instead, it relies on a mathematical operation called convolution. In mathematics, convolution refers to the integral of the product of two functions that generates a third function after one function is reversed and shifted to achieve the desired outcome.

CNNs are inspired by biological neural connections, particularly those found in the visual cortex of animals. They are widely used for image recognition and video classification, requiring smaller datasets compared to other algorithmic approaches.

The design of CNNs typically includes one input and one output layer, with multiple hidden layers in between to classify the input image. The hidden layers in CNNs generally consist of a series of convolutional layers, which perform operations such as multiplication or dot product. The activation function commonly used in CNNs is the ReLU function, and the network also includes fully connected layers and normalization layers. These hidden layers are crucial, as their inputs and outputs are determined by the activation function and the final convolution.

Deep Learning Applications: –

  • Image Classifier
  • Object Detection
  • Computer Vision
  • Voice Recognition
  • Handwriting Recognition etc.

Automatic Speech Recognition error rate list summarized since 1991

MethodPercent Phone Error Rate (PER) (%)
Randomly Initialized RNN26.1
Bayesian Triphone GMM-HMM25.6
Hidden Trajectory (Generative) Model24.8
Monophone Randomly Initialized DNN23.4
Monophone DBN-DNN22.4
Triphone GMM-HMM with BMMI Training21.7
Monophone DBN-DNN on fbank20.7
Convolutional DNN20.0
Convolutional DNN with Heterogeneous Pooling18.7
Ensemble DNN/CNN/RNN18.3
Bidirectional LSTM17.9
Hierarchical Convolutional Deep Max-out Network16.5

Explore more about Robotics, Machine Learning, Deep Learning, AI, and Data Analytics at Kritrim Intelligence .