**NPTEL Deep Learning – IIT Ropar Assignment Answer**

## NPTEL Deep Learning – IIT Ropar Week 11 Assignment Answer 2023

**1. Which of the following is a limitation of traditional feedforward neural networks in handling sequential data?**

They can only process fixed-length input sequences

They can handle variable-length input sequences

They can’t model temporal dependencies between sequential data

They are not affected by the order of input sequences

Answer :-a, b, d

**2. Which of the following is a common architecture used for sequence learning in deep learning?**

Convolutional Neural Networks (CNNs)

Autoencoders

Recurrent Neural Networks (RNNs)

Generative Adversarial Networks (GANs)

Answer :-c

**3. What is the vanishing gradient problem in training RNNs?**

The weights of the network converge to zero during training

The gradients used for weight updates become too large

The gradients used for weight updates become too small

The network becomes overfit to the training data

[ihc-hide-content ihc_mb_type=”show” ihc_mb_who=”1,2,3″ ihc_mb_template=”1″ ]

Answer :-c

**4. Which of the following is the main disadvantage of using BPTT?**

It is computationally expensive.

It is difficult to implement.

It requires a large amount of data.

It is prone to overfitting.

Answer :-a

**5. Arrange the following sequence in the order they are performed by LSTM at time step t.[Selectively read, Selectively write, Selectively forget]**

Selectively read, Selectively write, Selectively forget

Selectively write, Selectively read, Selectively forget

Selectively read, Selectively forget, Selectively write

Selectively forget, Selectively write, Selectively read

Answer :-a

**6. What are the problems in the RNN architecture?**

Morphing of information stored at each time step.

Exploding and Vanishing gradient problem.

Errors caused at time step tn can’t be related to previous time steps faraway

All of the above

Answer :-d

**7.What is the purpose of the forget gate in an LSTM network?**

To decide how much of the cell state to keep from the previous time step

To decide how much of the current input to add to the cell state

To decide how much of the current cell state to output

To decide how much of the current input to output

Answer :-a

Answer :-c

**9. How many neurons are in the hidden layer at state s2 of the RNN?**

6

2

9

4

Answer :-d

Answer :-a

[/ihc-hide-content]

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 10 Assignment Answer 2023

**1. Which of the following architectures has the highest no of layers?**

AlexNet

GoogleNet

VGG

ResNet

Answer :- d

**2. Consider a convolution operation with an input image of size 100x100x3 and a filter of size 8x8x3, using a stride of 1 and a padding of 1. What is the output size?**

100x100x3

98x98x1

102x102x3

95x95x1

Answer :-d

**3. Consider a convolution operation with an input image of size 256x256x3 and 40 filters of size 11x11x3, using a stride of 4 and a padding of 2. What is the height of the output size?**

63

64

40

3

Answer :-b

**4. Which statement is true about the number of filters in CNNs?**

More filters lead to better accuracy.

Fewer filters lead to better accuracy.

The number of filters has no effect on accuracy.

The number of filters only affects the computation time.

Answer :-a

**5. Which of the following statements is true regarding the occlusion experiment in a CNN?**

It is used to determine the importance of each feature map in the output of the network.

It involves masking a portion of the input image with a patch of zeroes.

It is a technique used to prevent overfitting in deep learning models.

It is used to increase the number of filters in a convolutional layer.

Answer :-a, d

**6. Which of the following is an innovation introduced in GoogleNet architecture?**1×1 convolutions to reduce the dimension

ReLU activation function

Dropout regularization

use of different-sized filters for the same input

Answer :-a, d

**7. What is the purpose of guided backpropagation in CNNs?**To visualize which pixels in an image are most important for a particular class prediction.

To train the CNN to improve its accuracy on a given task.

To reduce the size of the input images in order to speed up computation.

None of the above.

Answer :-a

**8. Which layer in a CNN is used for guided backpropagation?**

Input layer

Convolutional layer

Activation layer

Pooling layer

Answer :-c

**9. Which of the following is a technique used to fool CNNs in Deep Learning?**

Adversarial examples

Transfer learning

Dropout

Batch normalization

Answer :-a

**10. We have a trained CNN. We have the picture on the left which when fed into the network as input is given the label ’HUMAN’ with high probability. The picture on the right is the same image with some added noise. If we feed the right image as input to the CNN then which of the following statements is True?**

CNN will detect the image as ‘HUMAN’

CNN will not detect the image as ‘HUMAN’ since noise is added to the image.

CNN will detect the image as ‘HUMAN’ but with a lower probability than the left image.

Insufficient information to say anything

Answer :-a

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 9 Assignment Answer 2023

**1. Which of the following is a disadvantage of one hot encoding?**

It requires a large amount of memory to store the vectors

It can result in a high-dimensional sparse representation

It cannot capture the semantic similarity between words

All of the above

Answer :-d

**2. Which of the following is true about the input representation in the CBOW model?**

Each word is represented as a one-hot vector

Each word is represented as a continuous vector

Each word is represented as a sequence of one-hot vectors

Each word is represented as a sequence of continuous vectors

Answer :-b

**3. Which of the following is an advantage of the CBOW model compared to the Skip-gram model?**

It is faster to train

It requires less memory

It performs better on rare words

All of the above

Answer :-a

**4. Which of the following is an advantage of using the skip-gram method over the bag-of-words approach?**

The skip-gram method is faster to train

The skip-gram method performs better on rare words

The bag-of-words approach is more accurate

The bag-of-words approach is better for short texts

Answer :-b

**5. What is the role of the softmax function in the skip-gram method?**

To calculate the dot product between the target word and the context words

To transform the dot product into a probability distribution

To calculate the distance between the target word and the context words

To adjust the weights of the neural network during training

Answer :-b

**6. Suppose we are learning the representations of words using Glove representations. If we observe that the cosine similarity between two representations vi and vj for words ‘i’ and ‘j’ is very high. which of the following statements is true?( parameter bi = 0.02 and bj = 0.05**

Xij=0.03.

Xij=0.8.

Xij=0.35.

Xij=0.

Answer :-b

**7. We add incorrect pairs into our corpus to maximize the probability of words that occur in the same context and minimize the probability of words that occur in different contexts. This technique is called-**

Hierarchical softmax

Contrastive estimation

Negative sampling

Glove representations

Answer :-c

**8. What is the computational complexity of computing the softmax function in the output layer of a neural network?**

O(n)

O(n2)

O(nlogn)

O(logn)

Answer :-a

**9. How does Hierarchical Softmax reduce the computational complexity of computing the softmax function?**

- It replaces the softmax function with a linear function
- It uses a binary tree to approximate the softmax function
- It uses a heuristic to compute the softmax function faster
- It does not reduce the computational complexity of computing the softmax function

Answer :-b

**10. What is the disadvantage of using Hierarchical Softmax?**

- It requires more memory to store the binary tree
- It is slower than computing the softmax function directly
- It is less accurate than computing the softmax function directly
- It is more prone to overfitting than computing the softmax function directly

Answer :-b

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 8 Assignment Answer 2023

**1. Which of the following best describes the concept of saturation in deep learning?**

- When the activation function output approaches either 0 or 1 and the gradient is close to zero.
- When the activation function output is very small and the gradient is close to zero.
- When the activation function output is very large and the gradient is close to zero.
- None of the above.

Answer :- a, b, c

**2. Which of the following methods can help to avoid saturation in deep learning?**

- Using a different activation function.
- Increasing the learning rate.
- Increasing the model complexity
- All of the above.

Answer :-a

**3. Which of the following is true about the role of unsupervised pre-training in deep learning?**

- It is used to replace the need for labeled data
- It is used to initialize the weights of a deep neural network
- It is used to fine-tune a pre-trained model
- It is only useful for small datasets

Answer :-b

**4. Which of the following is an advantage of unsupervised pre-training in deep learning?**

- It helps in reducing overfitting
- Pre-trained models converge faster
- It improves the accuracy of the model
- It requires fewer computational resources

Answer :-b, c

**5. What is the main cause of the Dead ReLU problem in deep learning?**

- High variance
- High negative bias
- Overfitting
- Underfitting

Answer :-b

**6. How can you tell if your network is suffering from the Dead ReLU problem?**

- The loss function is not decreasing during training
- The accuracy of the network is not improving
- A large number of neurons have zero output
- The network is overfitting to the training data

Answer :-c

**7. What is the mathematical expression for the ReLU activation function?**

- f(x) = x if x < 0, 0 otherwise
- f(x) = 0 if x > 0, x otherwise
- f(x) = max(0,x)
- f(x) = min(0,x)

Answer :-c

**8. What is the main cause of the symmetry breaking problem in deep learning?**

- High variance
- High bias
- Overfitting
- Equal initialization of weights

Answer :-d

**9. What is the purpose of Batch Normalization in Deep Learning?**

- To improve the generalization of the model
- To reduce overfitting
- To reduce bias in the model
- To ensure that the distribution of the inputs at different layers doesn’t change

Answer :-d

**10. In Batch Normalization, which parameter is learned during training?**

- Mean
- Variance
- γ
- ϵ

Answer :-c

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 7 Assignment Answer 2023

**1. Which of the following statements is true about the bias-variance tradeoff in deep learning?**

- Increasing the learning rate reduces bias
- Increasing the learning rate reduces variance
- Decreasing the learning rate reduces bias
- None of These

Answer :-d

**2. Which of the following statements is true about the bias-variance tradeoff in deep learning?**

- Increasing the size of the training dataset reduces bias
- Increasing the size of the training dataset reduces variance
- Decreasing the size of the training dataset reduces bias
- Decreasing the size of the training dataset reduces variance

Answer :-b

**3. What is the effect of high bias on a model’s performance?**

- The model will overfit the training data.
- The model will underfit the training data.
- The model will be unable to learn anything from the training data.
- The model’s performance will be unaffected by bias.

Answer :-b

**4. What is the usual relationship between train error and test error?**

- Train error is usually higher than test error
- Train error is usually lower than test error
- Train error and test error are usually the same
- Train error and test error are unrelated

Answer :-b

**5. What is overfitting in deep learning?**

- When the model performs well on the training data but poorly on new, unseen data
- When the model performs poorly on the training data and on new, unseen data
- When the model has a high test error and a low train error
- When the model has a low test error and a high train error

Answer :-b, c

**6. How can overfitting be prevented in deep learning?**

- By increasing the complexity of the model
- By decreasing the size of the training data
- By adding more layers to the model
- By using regularization techniques such as dropout

Answer :-d

**7. Which of the following statements is true about L2 regularization?**

- It adds a penalty term to the loss function that is proportional to the absolute value of the weights.
- It adds a penalty term to the loss function that is proportional to the square of the weights.
- It give us sparse solutions for w.
- It is equivalent to adding gaussian noise to the weights.

Answer :-b, d

**8. Which of the following regularization techniques is likely to produce a sparse weight vector?**

- L1 regularization
- L2 regularization
- Dropout
- Data augmentation

Answer :-a

**9. We trained different models on data and then we used the bagging technique. We observe that our test error reduces drastically after using bagging. Choose the correct options.**

- All models had the same hyperparameters and were trained on the same features
- All the models were correlated.
- All the models were uncorrelated(independent).
- All of these.

Answer :-c

Answer :-a, c

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 6 Assignment Answer 2023

**1. What is the main purpose of a hidden layer in an under-complete autoencoder?**

- To increase the number of neurons in the network
- To reduce the number of neurons in the network
- To limit the capacity of the network
- None of These

Answer :-c

**2. Which of the following problems prevents us from using autoencoders for the task of Image compression?**

- Images are not allowed as input to autoencoders
- Difficulty in training deep neural networks
- Loss of image quality due to compression
- Auto encoders are not capable of producing image output

Answer :-c

**3. Which of the following is a potential advantage of using an overcomplete autoencoder?**

- Reduction of the risk of overfitting
- Ability to learn more complex and nonlinear representations
- Faster training time
- To compress the input data

Answer :-b

**4. What is/are the primary advantages of Autoencoders over PCA?**

- Autoencoders are less prone to overfitting than PCA.
- Autoencoders are faster and more efficient than PCA.
- Autoencoders require fewer input data than PCA.
- Autoencoders can capture nonlinear relationships in the input data.

Answer :-d

**5. Which of the following is a potential disadvantage of using autoencoders for dimensionality reduction over PCA?**

- Autoencoders are computationally expensive and may require more training data than PCA.
- Autoencoders are bad at capturing complex relationships in data
- Autoencoders may overfit the training data and generalize poorly to new data.
- Autoencoders are unable to handle linear relationships between data.

Answer :-a, b

**6. What is the primary objective of sparse autoencoders that distinguishes it from vanilla autoencoder?**

- They learn a low-dimensional representation of the input data
- They minimize the reconstruction error between the input and the output
- They capture only the important variations/features in the data
- They maximize the mutual information between the input and the output

Answer :-c

**7. Which of the following networks represents an autoencoder?**

Answer :-c

**8. If the dimension of the hidden layer representation is more than the dimension of the input layer, then what kind of autoencoder do we have?**

- Complete autoencoder
- Under-complete autoencoder
- Overcomplete autoencoder
- Sparse autoencoder

Answer :-c

**9. Suppose for one data point we have features x1,x2,x3,x4,x5 as −2,12,4.2,7.6,0 then, which of the following function should we use on the output layer(decoder)?**

- Logistic
- Relu
- Tanh
- Linear

Answer :-d

**10. If the dimension of the input layer in an under-complete autoencoder is 6, what is the possible dimension of the hidden layer?**

- 6
- 2
- 8
- 0

Answer :-b

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 5 Assignment Answer 2023

**1. Which of the following is a property of eigenvalues of a symmetric matrix?**

- Eigenvalues are always positive
- Eigenvalues are always real
- Eigenvalues are always negative
- Eigenvalues can be complex numbers with imaginary part non zero

Answer :- b

**2. What is the determinant of a matrix with eigenvalues λ1 and λ2?**

- λ1 + λ2
- λ1 – λ2
- λ1 * λ2
- λ1 / λ2

Answer :-C

**3. Which of the following is a measure of the amount of variance explained by a principal component in PCA?**

- Eigenvalue
- Covariance
- Correlation
- Mean absolute deviation

Answer :-A

**4. What is the mean of the given data points x1,x2,x3?**

- [3 3]
- [0 0]
- [1 1]
- [0.5 0.5]

Answer :-C

Answer :-D

**6. The maximum eigenvalue of the covariance matrix C is:**

- 1/3
- 4/3
- 1/6
- 1/2

Answer :-B

**7. The eigenvector corresponding to the maximum eigenvalue of the given matrix C is:**

Answer :-B

Answer :-A

**9. What is the covariance between height and weight in the given dataset?(Use the formula)**

- 121.2
- 89.6
- 62.6
- 74

Answer :-C

**10. What is the correlation between height and weight in the given dataset**

- 0.7
- 1
- 0.96
- 0.59

Answer :-C

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023

**1. Which step does Nesterov accelerated gradient descent perform before finding the update size?**

- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size

Answer :-b

**2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.**

- Learning rate
- Momentum
- Gamma
- None of the above

Answer :-a

**3. What does the distance between two contour lines on a contour map represent?**

- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above

Answer :-c

**4. Which of the following represents the contour plot of the function f(x,y) = x2−y?**

Answer :-c

**5. What is the main advantage of using Adagrad over other optimization algorithms?**

- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.

Answer :-b

**6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?**

- η is large
- ∇w is small
- ∇w is large
- η is small

Answer :-b, d

**7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?**

- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent

Answer :-b

**8. What is the update rule for the ADAM optimizer?**

- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))

Answer :-a

**9. What is the advantage of using mini-batch gradient descent over batch gradient descent?**

- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.

Answer :-a, d

**10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?**

- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad

Answer :-c

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

## NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023

1. Which of the following statements about backpropagation is true?

- It is used to optimize the weights in a neural network.
- It is used to compute the output of a neural network.
- It is used to initialize the weights in a neural network.
- It is used to regularize the weights in a neural network.

Answer:- a

2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?

Answer:- b. −(ylogp+(1−y)log(1−p))/

3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.

Answer:- c. −∑Mc=1yo,clog(po,c)

4. Can cross-entropy loss be negative between two probability distributions?

- Yes
- No

Answer:- b

5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?

- p=q
- All the values in p� are lower than corresponding values in q�
- All the values in p� are lower than corresponding values in q�
- p� = 0 [0 is a vector]

Answer:- a

6. Which of the following is false about cross-entropy loss between two probability distributions?

It is always in range (0,1)

It can be negative.

It is always positive.

It can be 1.

Answer:- a, b

7. The probability of all the events x1,x2,x2….xn

in a system is equal(n>1

). What can you say about the entropy H(X)

of that system?(base of log is 2)

- H(X)≤1
- H(X)=1
- H(X)≥1
- We can’t say anything conclusive with the provided information.

Answer:- c

8. Suppose we have a problem where data x

and label y

are related by y=x4+1

. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?

- Linear
- Relu
- Sigmoid
- Tan
^{−1}(x)

Answer:- a

9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?

- Event A has a high information content
- Event B has a low information content
- Event A has a low information content
- Event B has a high information content

Answer:- c, d

10. Which of the following activation functions can only give positive outputs greater than 0?

- Sigmoid
- ReLU
- Tanh
- Linear

Answer:- a

## NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023

**1. What is the range of the sigmoid function σ(x)=1/1+e ^{−x}? **

- (−1,1)
- (0,1)
- −∞,∞)
- (0,∞)

Answer :- (0, 1)The sigmoid function σ(x) = 1 / (1 + e^(-x)) outputs values between 0 and 1. As x approaches positive infinity, the value of σ(x) approaches 1, and as x approaches negative infinity, the value of σ(x) approaches 0. Therefore, the range of the sigmoid function is between 0 and 1, but it never actually reaches 0 or 1.

**2. What happens to the output of the sigmoid function as |x| very small?**

- The output approaches 0.5
- The output approaches 1.
- The output oscillates between 0 and 1.
- The output becomes undefined.

Answer :- The output approaches 0.5As the absolute value of x becomes very small (close to 0), the exponential term e^(-x) in the sigmoid function becomes very close to 1. As a result, the denominator of the sigmoid function (1 + e^(-x)) becomes approximately 2. This leads to the output of the sigmoid function approaching 1/2, which is 0.5. So, as |x| becomes very small, the output of the sigmoid function approaches 0.5.

**3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?**

- Bayes’ theorem
- Central limit theorem
- Fourier’s theorem
- Universal approximation theorem

Answer :- Universal approximation theoremThe Universal Approximation Theorem states that a feedforward neural network with a single hidden layer containing a finite number of neurons (units) can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in that hidden layer. This theorem highlights the powerful approximation capabilities of neural networks.

**4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?**

- 301
- 451
- 150
- 500

Answer :-301

**5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)**

Answer :- 50

**6. What is the derivative of the ReLU activation function with respect to its input at 0?**

- 0
- 1
- −1
- Not differentiable

Answer :-Not differentiable

**7. Consider a function f(x)=x ^{3}−3x^{2}+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?**

Answer :-1.85,1.95

**8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?**

- A multilayer network of sigmoid neurons can represent any Boolean function.
- A multilayer network of sigmoid neurons can represent any continuous function.
- A multilayer network of sigmoid neurons can represent any function.
- A multilayer network of sigmoid neurons can represent any linear function.

Answer :- A multilayer network of sigmoid neurons can represent any continuous function.This statement reflects the universal approximation theorem, which states that a feedforward neural network with a single hidden layer containing a finite number of sigmoid (or similar activation function) neurons can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in the hidden layer.

**9. How many boolean functions can be designed for 3 inputs?**

- 65,536
- 82
- 256
- 64

Answer :-256

**10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)**

- 16
- 64
- 16
- 32

Answer :-64

## NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023

**1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?**

- Yes
- No
- Cannot be determined from the given information

Answer :- a. Yes

**2. What is the perceptron algorithm used for?**

- Clustering data points
- Finding the shortest path in a graph
- Classifying data
- Solving optimization problems

Answer :- c. Classifying dataThe perceptron algorithm is a type of supervised learning algorithm used for binary classification tasks. It takes an input vector and assigns it to one of two possible categories or classes based on a linear combination of the input features and associated weights. The perceptron algorithm is particularly effective when the data is linearly separable, as it tries to find a hyperplane that can separate the two classes.

**3. What is the most common activation function used in perceptrons?**

- Sigmoid
- ReLU
- Tanh
- Step

Answer :- d. StepThe step function is a type of activation function that takes an input and returns 1 if the input is greater than or equal to a threshold value, and 0 otherwise. It is one of the simplest activation functions used in early versions of perceptrons.

**4. Which of the following Boolean functions cannot be implemented by a perceptron?**

- AND
- OR
- XOR
- NOT

Answer :-XOR

**5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)**

- (0,0)
- (−2,−2)
- (−2,−3)
- (1,1)

Answer :- (−2,−3)

**6. We are given the following data:**

Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)

- Yes
- No

Answer :- b. No

**7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?**

- 21
- 31
- 30
- 32

Answer :- c. 31

**8. Which of the following best represents the meaning of term “Artificial Intelligence”?**

- The ability of a machine to perform tasks that normally require human intelligence
- The ability of a machine to perform simple, repetitive tasks
- The ability of a machine to follow a set of pre-defined rules
- The ability of a machine to communicate with other machines

Answer :- a. The ability of a machine to perform tasks that normally require human intelligence.Artificial Intelligence (AI) refers to the capability of machines or computer systems to perform tasks that typically require human intelligence, such as problem-solving, learning, reasoning, understanding natural language, and adapting to new situations. AI aims to create machines that can simulate human-like intelligence and behavior, enabling them to perform complex tasks and make decisions without direct human intervention.

**9. Which of the following statements is true about error surfaces in deep learning?**

- They are always convex functions.
- They can have multiple local minima.
- They are never continuous.
- They are always linear functions.

Answer :- They can have multiple local minima.Error surfaces in deep learning, also known as loss surfaces or cost functions, represent the relationship between the model's parameters (weights and biases) and the error (or loss) of the model on the training data. These surfaces are typically non-convex, meaning they can have multiple local minima, maxima, and saddle points. Local minima are points where the error is relatively low compared to the neighboring points, but they may not be the global minimum, which represents the best set of parameters for the model.

**10. What is the output of the following MP neuron for the AND Boolean function?**

**y={1,0,if x1+x2+x3≥1 0, therwise **

- y=1 for (x1,x2,x3)=(0,1,1)
- y=0 for (x1,x2,x3)=(0,0,1)
- y=1 for (x1,x2,x3)=(1,1,1)
- y=0 for (x1,x2,x3)=(1,0,0)

Answer :- a. y=1 for (x1,x2,x3)=(0,1,1)c.y=1 for (x1,x2,x3)=(1,1,1)

Course Name | Deep Learning – IIT Ropar |

Category | NPTEL Assignment Answer |

Home | Click Here |

Join Us on Telegram | Click Here |

Pingback: [Week 7] NPTEL Deep Learning – IIT Ropar Assignment Answers 2023 – Context UAE