NPTEL Deep Learning – IIT Ropar Assignment Answer
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 3 NPTEL Deep Learning IIT Ropar](https://gecmunger.in/wp-content/uploads/2023/07/NPTEL-Deep-Learning-IIT-Ropar-1024x576.png)
NPTEL Deep Learning – IIT Ropar Week 11 Assignment Answer 2023
1. Which of the following is a limitation of traditional feedforward neural networks in handling sequential data?
They can only process fixed-length input sequences
They can handle variable-length input sequences
They can’t model temporal dependencies between sequential data
They are not affected by the order of input sequences
Answer :- a, b, d
2. Which of the following is a common architecture used for sequence learning in deep learning?
Convolutional Neural Networks (CNNs)
Autoencoders
Recurrent Neural Networks (RNNs)
Generative Adversarial Networks (GANs)
Answer :- c
3. What is the vanishing gradient problem in training RNNs?
The weights of the network converge to zero during training
The gradients used for weight updates become too large
The gradients used for weight updates become too small
The network becomes overfit to the training data
[ihc-hide-content ihc_mb_type=”show” ihc_mb_who=”1,2,3″ ihc_mb_template=”1″ ]
Answer :- c
4. Which of the following is the main disadvantage of using BPTT?
It is computationally expensive.
It is difficult to implement.
It requires a large amount of data.
It is prone to overfitting.
Answer :- a
5. Arrange the following sequence in the order they are performed by LSTM at time step t.
[Selectively read, Selectively write, Selectively forget]
Selectively read, Selectively write, Selectively forget
Selectively write, Selectively read, Selectively forget
Selectively read, Selectively forget, Selectively write
Selectively forget, Selectively write, Selectively read
Answer :- a
6. What are the problems in the RNN architecture?
Morphing of information stored at each time step.
Exploding and Vanishing gradient problem.
Errors caused at time step tn can’t be related to previous time steps faraway
All of the above
Answer :- d
7.What is the purpose of the forget gate in an LSTM network?
To decide how much of the cell state to keep from the previous time step
To decide how much of the current input to add to the cell state
To decide how much of the current cell state to output
To decide how much of the current input to output
Answer :- a
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 4 image 28](https://gecmunger.in/wp-content/uploads/2023/10/image-28.png)
Answer :- c
9. How many neurons are in the hidden layer at state s2 of the RNN?
6
2
9
4
Answer :- d
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 5 image 29](https://gecmunger.in/wp-content/uploads/2023/10/image-29.png)
Answer :- a
[/ihc-hide-content]
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 10 Assignment Answer 2023
1. Which of the following architectures has the highest no of layers?
AlexNet
GoogleNet
VGG
ResNet
Answer :- d
2. Consider a convolution operation with an input image of size 100x100x3 and a filter of size 8x8x3, using a stride of 1 and a padding of 1. What is the output size?
100x100x3
98x98x1
102x102x3
95x95x1
Answer :- d
3. Consider a convolution operation with an input image of size 256x256x3 and 40 filters of size 11x11x3, using a stride of 4 and a padding of 2. What is the height of the output size?
63
64
40
3
Answer :- b
4. Which statement is true about the number of filters in CNNs?
More filters lead to better accuracy.
Fewer filters lead to better accuracy.
The number of filters has no effect on accuracy.
The number of filters only affects the computation time.
Answer :- a
5. Which of the following statements is true regarding the occlusion experiment in a CNN?
It is used to determine the importance of each feature map in the output of the network.
It involves masking a portion of the input image with a patch of zeroes.
It is a technique used to prevent overfitting in deep learning models.
It is used to increase the number of filters in a convolutional layer.
Answer :- a, d
6. Which of the following is an innovation introduced in GoogleNet architecture?
1×1 convolutions to reduce the dimension
ReLU activation function
Dropout regularization
use of different-sized filters for the same input
Answer :- a, d
7. What is the purpose of guided backpropagation in CNNs?
To visualize which pixels in an image are most important for a particular class prediction.
To train the CNN to improve its accuracy on a given task.
To reduce the size of the input images in order to speed up computation.
None of the above.
Answer :- a
8. Which layer in a CNN is used for guided backpropagation?
Input layer
Convolutional layer
Activation layer
Pooling layer
Answer :- c
9. Which of the following is a technique used to fool CNNs in Deep Learning?
Adversarial examples
Transfer learning
Dropout
Batch normalization
Answer :- a
10. We have a trained CNN. We have the picture on the left which when fed into the network as input is given the label ’HUMAN’ with high probability. The picture on the right is the same image with some added noise. If we feed the right image as input to the CNN then which of the following statements is True?
CNN will detect the image as ‘HUMAN’
CNN will not detect the image as ‘HUMAN’ since noise is added to the image.
CNN will detect the image as ‘HUMAN’ but with a lower probability than the left image.
Insufficient information to say anything
Answer :- a
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 9 Assignment Answer 2023
1. Which of the following is a disadvantage of one hot encoding?
It requires a large amount of memory to store the vectors
It can result in a high-dimensional sparse representation
It cannot capture the semantic similarity between words
All of the above
Answer :- d
2. Which of the following is true about the input representation in the CBOW model?
Each word is represented as a one-hot vector
Each word is represented as a continuous vector
Each word is represented as a sequence of one-hot vectors
Each word is represented as a sequence of continuous vectors
Answer :- b
3. Which of the following is an advantage of the CBOW model compared to the Skip-gram model?
It is faster to train
It requires less memory
It performs better on rare words
All of the above
Answer :- a
4. Which of the following is an advantage of using the skip-gram method over the bag-of-words approach?
The skip-gram method is faster to train
The skip-gram method performs better on rare words
The bag-of-words approach is more accurate
The bag-of-words approach is better for short texts
Answer :- b
5. What is the role of the softmax function in the skip-gram method?
To calculate the dot product between the target word and the context words
To transform the dot product into a probability distribution
To calculate the distance between the target word and the context words
To adjust the weights of the neural network during training
Answer :- b
6. Suppose we are learning the representations of words using Glove representations. If we observe that the cosine similarity between two representations vi and vj for words ‘i’ and ‘j’ is very high. which of the following statements is true?( parameter bi = 0.02 and bj = 0.05
Xij=0.03.
Xij=0.8.
Xij=0.35.
Xij=0.
Answer :- b
7. We add incorrect pairs into our corpus to maximize the probability of words that occur in the same context and minimize the probability of words that occur in different contexts. This technique is called-
Hierarchical softmax
Contrastive estimation
Negative sampling
Glove representations
Answer :- c
8. What is the computational complexity of computing the softmax function in the output layer of a neural network?
O(n)
O(n2)
O(nlogn)
O(logn)
Answer :- a
9. How does Hierarchical Softmax reduce the computational complexity of computing the softmax function?
- It replaces the softmax function with a linear function
- It uses a binary tree to approximate the softmax function
- It uses a heuristic to compute the softmax function faster
- It does not reduce the computational complexity of computing the softmax function
Answer :- b
10. What is the disadvantage of using Hierarchical Softmax?
- It requires more memory to store the binary tree
- It is slower than computing the softmax function directly
- It is less accurate than computing the softmax function directly
- It is more prone to overfitting than computing the softmax function directly
Answer :- b
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 8 Assignment Answer 2023
1. Which of the following best describes the concept of saturation in deep learning?
- When the activation function output approaches either 0 or 1 and the gradient is close to zero.
- When the activation function output is very small and the gradient is close to zero.
- When the activation function output is very large and the gradient is close to zero.
- None of the above.
Answer :- a, b, c
2. Which of the following methods can help to avoid saturation in deep learning?
- Using a different activation function.
- Increasing the learning rate.
- Increasing the model complexity
- All of the above.
Answer :- a
3. Which of the following is true about the role of unsupervised pre-training in deep learning?
- It is used to replace the need for labeled data
- It is used to initialize the weights of a deep neural network
- It is used to fine-tune a pre-trained model
- It is only useful for small datasets
Answer :- b
4. Which of the following is an advantage of unsupervised pre-training in deep learning?
- It helps in reducing overfitting
- Pre-trained models converge faster
- It improves the accuracy of the model
- It requires fewer computational resources
Answer :- b, c
5. What is the main cause of the Dead ReLU problem in deep learning?
- High variance
- High negative bias
- Overfitting
- Underfitting
Answer :- b
6. How can you tell if your network is suffering from the Dead ReLU problem?
- The loss function is not decreasing during training
- The accuracy of the network is not improving
- A large number of neurons have zero output
- The network is overfitting to the training data
Answer :- c
7. What is the mathematical expression for the ReLU activation function?
- f(x) = x if x < 0, 0 otherwise
- f(x) = 0 if x > 0, x otherwise
- f(x) = max(0,x)
- f(x) = min(0,x)
Answer :- c
8. What is the main cause of the symmetry breaking problem in deep learning?
- High variance
- High bias
- Overfitting
- Equal initialization of weights
Answer :- d
9. What is the purpose of Batch Normalization in Deep Learning?
- To improve the generalization of the model
- To reduce overfitting
- To reduce bias in the model
- To ensure that the distribution of the inputs at different layers doesn’t change
Answer :- d
10. In Batch Normalization, which parameter is learned during training?
- Mean
- Variance
- γ
- ϵ
Answer :- c
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 7 Assignment Answer 2023
1. Which of the following statements is true about the bias-variance tradeoff in deep learning?
- Increasing the learning rate reduces bias
- Increasing the learning rate reduces variance
- Decreasing the learning rate reduces bias
- None of These
Answer :- d
2. Which of the following statements is true about the bias-variance tradeoff in deep learning?
- Increasing the size of the training dataset reduces bias
- Increasing the size of the training dataset reduces variance
- Decreasing the size of the training dataset reduces bias
- Decreasing the size of the training dataset reduces variance
Answer :- b
3. What is the effect of high bias on a model’s performance?
- The model will overfit the training data.
- The model will underfit the training data.
- The model will be unable to learn anything from the training data.
- The model’s performance will be unaffected by bias.
Answer :- b
4. What is the usual relationship between train error and test error?
- Train error is usually higher than test error
- Train error is usually lower than test error
- Train error and test error are usually the same
- Train error and test error are unrelated
Answer :- b
5. What is overfitting in deep learning?
- When the model performs well on the training data but poorly on new, unseen data
- When the model performs poorly on the training data and on new, unseen data
- When the model has a high test error and a low train error
- When the model has a low test error and a high train error
Answer :- b, c
6. How can overfitting be prevented in deep learning?
- By increasing the complexity of the model
- By decreasing the size of the training data
- By adding more layers to the model
- By using regularization techniques such as dropout
Answer :- d
7. Which of the following statements is true about L2 regularization?
- It adds a penalty term to the loss function that is proportional to the absolute value of the weights.
- It adds a penalty term to the loss function that is proportional to the square of the weights.
- It give us sparse solutions for w.
- It is equivalent to adding gaussian noise to the weights.
Answer :- b, d
8. Which of the following regularization techniques is likely to produce a sparse weight vector?
- L1 regularization
- L2 regularization
- Dropout
- Data augmentation
Answer :- a
9. We trained different models on data and then we used the bagging technique. We observe that our test error reduces drastically after using bagging. Choose the correct options.
- All models had the same hyperparameters and were trained on the same features
- All the models were correlated.
- All the models were uncorrelated(independent).
- All of these.
Answer :- c
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 7 image 32](https://gecmunger.in/wp-content/uploads/2023/09/image-32.png)
Answer :- a, c
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 6 Assignment Answer 2023
1. What is the main purpose of a hidden layer in an under-complete autoencoder?
- To increase the number of neurons in the network
- To reduce the number of neurons in the network
- To limit the capacity of the network
- None of These
Answer :- c
2. Which of the following problems prevents us from using autoencoders for the task of Image compression?
- Images are not allowed as input to autoencoders
- Difficulty in training deep neural networks
- Loss of image quality due to compression
- Auto encoders are not capable of producing image output
Answer :- c
3. Which of the following is a potential advantage of using an overcomplete autoencoder?
- Reduction of the risk of overfitting
- Ability to learn more complex and nonlinear representations
- Faster training time
- To compress the input data
Answer :- b
4. What is/are the primary advantages of Autoencoders over PCA?
- Autoencoders are less prone to overfitting than PCA.
- Autoencoders are faster and more efficient than PCA.
- Autoencoders require fewer input data than PCA.
- Autoencoders can capture nonlinear relationships in the input data.
Answer :- d
5. Which of the following is a potential disadvantage of using autoencoders for dimensionality reduction over PCA?
- Autoencoders are computationally expensive and may require more training data than PCA.
- Autoencoders are bad at capturing complex relationships in data
- Autoencoders may overfit the training data and generalize poorly to new data.
- Autoencoders are unable to handle linear relationships between data.
Answer :- a, b
6. What is the primary objective of sparse autoencoders that distinguishes it from vanilla autoencoder?
- They learn a low-dimensional representation of the input data
- They minimize the reconstruction error between the input and the output
- They capture only the important variations/features in the data
- They maximize the mutual information between the input and the output
Answer :- c
7. Which of the following networks represents an autoencoder?
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 8 image 12](https://gecmunger.in/wp-content/uploads/2023/09/image-12.png)
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 9 image 13](https://gecmunger.in/wp-content/uploads/2023/09/image-13.png)
Answer :- c
8. If the dimension of the hidden layer representation is more than the dimension of the input layer, then what kind of autoencoder do we have?
- Complete autoencoder
- Under-complete autoencoder
- Overcomplete autoencoder
- Sparse autoencoder
Answer :- c
9. Suppose for one data point we have features x1,x2,x3,x4,x5 as −2,12,4.2,7.6,0 then, which of the following function should we use on the output layer(decoder)?
- Logistic
- Relu
- Tanh
- Linear
Answer :- d
10. If the dimension of the input layer in an under-complete autoencoder is 6, what is the possible dimension of the hidden layer?
- 6
- 2
- 8
- 0
Answer :- b
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 5 Assignment Answer 2023
1. Which of the following is a property of eigenvalues of a symmetric matrix?
- Eigenvalues are always positive
- Eigenvalues are always real
- Eigenvalues are always negative
- Eigenvalues can be complex numbers with imaginary part non zero
Answer :- b
2. What is the determinant of a matrix with eigenvalues λ1 and λ2?
- λ1 + λ2
- λ1 – λ2
- λ1 * λ2
- λ1 / λ2
Answer :- C
3. Which of the following is a measure of the amount of variance explained by a principal component in PCA?
- Eigenvalue
- Covariance
- Correlation
- Mean absolute deviation
Answer :- A
4. What is the mean of the given data points x1,x2,x3?
- [3 3]
- [0 0]
- [1 1]
- [0.5 0.5]
Answer :- C
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 10 image 55](https://gecmunger.in/wp-content/uploads/2023/08/image-55.png)
Answer :- D
6. The maximum eigenvalue of the covariance matrix C is:
- 1/3
- 4/3
- 1/6
- 1/2
Answer :- B
7. The eigenvector corresponding to the maximum eigenvalue of the given matrix C is:
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 11 image 56](https://gecmunger.in/wp-content/uploads/2023/08/image-56.png)
Answer :- B
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 12 image 57](https://gecmunger.in/wp-content/uploads/2023/08/image-57.png)
Answer :- A
9. What is the covariance between height and weight in the given dataset?(Use the formula)
![[Week 11] NPTEL Deep Learning - IIT Ropar Assignment Answers 2023 14 image 58](https://gecmunger.in/wp-content/uploads/2023/08/image-58.png)
- 121.2
- 89.6
- 62.6
- 74
Answer :- C
10. What is the correlation between height and weight in the given dataset
- 0.7
- 1
- 0.96
- 0.59
Answer :- C
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answer 2023
1. Which step does Nesterov accelerated gradient descent perform before finding the update size?
- Increase the momentum
- Estimate the next position of the parameters
- Adjust the learning rate
- Decrease the step size
Answer :- b
2. Select the parameter of vanilla gradient descent controls the step size in the direction of the gradient.
- Learning rate
- Momentum
- Gamma
- None of the above
Answer :- a
3. What does the distance between two contour lines on a contour map represent?
- The change in the output of function
- The direction of the function
- The rate of change of the function
- None of the above
Answer :- c
4. Which of the following represents the contour plot of the function f(x,y) = x2−y?
Answer :- c
5. What is the main advantage of using Adagrad over other optimization algorithms?
- It converges faster than other optimization algorithms.
- It is less sensitive to the choice of hyperparameters (learning rate).
- It is more memory-efficient than other optimization algorithms.
- It is less likely to get stuck in local optima than other optimization algorithms.
Answer :- b
6. We are training a neural network using the vanilla gradient descent algorithm. We observe that the change in weights is small in successive iterations. What are the possible causes for the following phenomenon?
- η is large
- ∇w is small
- ∇w is large
- η is small
Answer :- b, d
7. You are given labeled data which we call X where rows are data points and columns feature. One column has most of its values as 0. What algorithm should we use here for faster convergence and achieve the optimal value of the loss function?
- NAG
- Adam
- Stochastic gradient descent
- Momentum-based gradient descent
Answer :- b
8. What is the update rule for the ADAM optimizer?
- wt=wt−1−lr∗(mt/(vt−−√+ϵ))
- wt=wt−1−lr∗m
- wt=wt−1−lr∗(mt/(vt+ϵ))
- wt=wt−1−lr∗(vt/(mt+ϵ))
Answer :- a
9. What is the advantage of using mini-batch gradient descent over batch gradient descent?
- Mini-batch gradient descent is more computationally efficient than batch gradient descent.
- Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
- Mini batch gradient descent gives us a better solution.
- Mini-batch gradient descent can converge faster than batch gradient descent.
Answer :- a, d
10. Which of the following is a variant of gradient descent that uses an estimate of the next gradient to update the current position of the parameters?
- Momentum optimization
- Stochastic gradient descent
- Nesterov accelerated gradient descent
- Adagrad
Answer :- c
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
NPTEL Deep Learning – IIT Ropar Week 3 Assignment Answer 2023
1. Which of the following statements about backpropagation is true?
- It is used to optimize the weights in a neural network.
- It is used to compute the output of a neural network.
- It is used to initialize the weights in a neural network.
- It is used to regularize the weights in a neural network.
Answer:- a
2. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?
Answer:- b. −(ylogp+(1−y)log(1−p))/
3. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.
Answer:- c. −∑Mc=1yo,clog(po,c)
4. Can cross-entropy loss be negative between two probability distributions?
- Yes
- No
Answer:- b
5. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?
- p=q
- All the values in p� are lower than corresponding values in q�
- All the values in p� are lower than corresponding values in q�
- p� = 0 [0 is a vector]
Answer:- a
6. Which of the following is false about cross-entropy loss between two probability distributions?
It is always in range (0,1)
It can be negative.
It is always positive.
It can be 1.
Answer:- a, b
7. The probability of all the events x1,x2,x2….xn
in a system is equal(n>1
). What can you say about the entropy H(X)
of that system?(base of log is 2)
- H(X)≤1
- H(X)=1
- H(X)≥1
- We can’t say anything conclusive with the provided information.
Answer:- c
8. Suppose we have a problem where data x
and label y
are related by y=x4+1
. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?
- Linear
- Relu
- Sigmoid
- Tan−1(x)
Answer:- a
9. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?
- Event A has a high information content
- Event B has a low information content
- Event A has a low information content
- Event B has a high information content
Answer:- c, d
10. Which of the following activation functions can only give positive outputs greater than 0?
- Sigmoid
- ReLU
- Tanh
- Linear
Answer:- a
NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answer 2023
1. What is the range of the sigmoid function σ(x)=1/1+e−x?
- (−1,1)
- (0,1)
- −∞,∞)
- (0,∞)
Answer :- (0, 1) The sigmoid function σ(x) = 1 / (1 + e^(-x)) outputs values between 0 and 1. As x approaches positive infinity, the value of σ(x) approaches 1, and as x approaches negative infinity, the value of σ(x) approaches 0. Therefore, the range of the sigmoid function is between 0 and 1, but it never actually reaches 0 or 1.
2. What happens to the output of the sigmoid function as |x| very small?
- The output approaches 0.5
- The output approaches 1.
- The output oscillates between 0 and 1.
- The output becomes undefined.
Answer :- The output approaches 0.5 As the absolute value of x becomes very small (close to 0), the exponential term e^(-x) in the sigmoid function becomes very close to 1. As a result, the denominator of the sigmoid function (1 + e^(-x)) becomes approximately 2. This leads to the output of the sigmoid function approaching 1/2, which is 0.5. So, as |x| becomes very small, the output of the sigmoid function approaches 0.5.
3. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?
- Bayes’ theorem
- Central limit theorem
- Fourier’s theorem
- Universal approximation theorem
Answer :- Universal approximation theorem The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer containing a finite number of neurons (units) can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in that hidden layer. This theorem highlights the powerful approximation capabilities of neural networks.
4. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?
- 301
- 451
- 150
- 500
Answer :- 301
5. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)
Answer :- 50
6. What is the derivative of the ReLU activation function with respect to its input at 0?
- 0
- 1
- −1
- Not differentiable
Answer :- Not differentiable
7. Consider a function f(x)=x3−3x2+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?
Answer :- 1.85,1.95
8. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?
- A multilayer network of sigmoid neurons can represent any Boolean function.
- A multilayer network of sigmoid neurons can represent any continuous function.
- A multilayer network of sigmoid neurons can represent any function.
- A multilayer network of sigmoid neurons can represent any linear function.
Answer :- A multilayer network of sigmoid neurons can represent any continuous function. This statement reflects the universal approximation theorem, which states that a feedforward neural network with a single hidden layer containing a finite number of sigmoid (or similar activation function) neurons can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in the hidden layer.
9. How many boolean functions can be designed for 3 inputs?
- 65,536
- 82
- 256
- 64
Answer :- 256
10. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)
- 16
- 64
- 16
- 32
Answer :- 64
NPTEL Deep Learning – IIT Ropar Week 1 Assignment Answer 2023
1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?
- Yes
- No
- Cannot be determined from the given information
Answer :- a. Yes
2. What is the perceptron algorithm used for?
- Clustering data points
- Finding the shortest path in a graph
- Classifying data
- Solving optimization problems
Answer :- c. Classifying data The perceptron algorithm is a type of supervised learning algorithm used for binary classification tasks. It takes an input vector and assigns it to one of two possible categories or classes based on a linear combination of the input features and associated weights. The perceptron algorithm is particularly effective when the data is linearly separable, as it tries to find a hyperplane that can separate the two classes.
3. What is the most common activation function used in perceptrons?
- Sigmoid
- ReLU
- Tanh
- Step
Answer :- d. Step The step function is a type of activation function that takes an input and returns 1 if the input is greater than or equal to a threshold value, and 0 otherwise. It is one of the simplest activation functions used in early versions of perceptrons.
4. Which of the following Boolean functions cannot be implemented by a perceptron?
- AND
- OR
- XOR
- NOT
Answer :- XOR
5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)
- (0,0)
- (−2,−2)
- (−2,−3)
- (1,1)
Answer :- (−2,−3)
6. We are given the following data:
Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)
- Yes
- No
Answer :- b. No
7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?
- 21
- 31
- 30
- 32
Answer :- c. 31
8. Which of the following best represents the meaning of term “Artificial Intelligence”?
- The ability of a machine to perform tasks that normally require human intelligence
- The ability of a machine to perform simple, repetitive tasks
- The ability of a machine to follow a set of pre-defined rules
- The ability of a machine to communicate with other machines
Answer :- a. The ability of a machine to perform tasks that normally require human intelligence. Artificial Intelligence (AI) refers to the capability of machines or computer systems to perform tasks that typically require human intelligence, such as problem-solving, learning, reasoning, understanding natural language, and adapting to new situations. AI aims to create machines that can simulate human-like intelligence and behavior, enabling them to perform complex tasks and make decisions without direct human intervention.
9. Which of the following statements is true about error surfaces in deep learning?
- They are always convex functions.
- They can have multiple local minima.
- They are never continuous.
- They are always linear functions.
Answer :- They can have multiple local minima. Error surfaces in deep learning, also known as loss surfaces or cost functions, represent the relationship between the model's parameters (weights and biases) and the error (or loss) of the model on the training data. These surfaces are typically non-convex, meaning they can have multiple local minima, maxima, and saddle points. Local minima are points where the error is relatively low compared to the neighboring points, but they may not be the global minimum, which represents the best set of parameters for the model.
10. What is the output of the following MP neuron for the AND Boolean function?
y={1,0,if x1+x2+x3≥1 0, therwise
- y=1 for (x1,x2,x3)=(0,1,1)
- y=0 for (x1,x2,x3)=(0,0,1)
- y=1 for (x1,x2,x3)=(1,1,1)
- y=0 for (x1,x2,x3)=(1,0,0)
Answer :- a. y=1 for (x1,x2,x3)=(0,1,1) c. y=1 for (x1,x2,x3)=(1,1,1)
Course Name | Deep Learning – IIT Ropar |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
Pingback: [Week 7] NPTEL Deep Learning – IIT Ropar Assignment Answers 2023 – Context UAE