# [Week 1 to 8] NPTEL Introduction To Machine Learning – IITKGP Assignment Answers 2023 NPTEL Introduction To Machine Learning – IITKGP Assignment Answer

## NPTEL Introduction To Machine Learning – IITKGP Week 8 Assignment Answer 2023

1. What is true about K-Mean Clustering?

1. K-means is extremely sensitive to cluster center initializations
a. 1 and 2
b. 1 and 3
c. All of the above
d. 2 and 3
`Answer :- c`

2. In which of the following cases will K-Means clustering fail to give good results? (Mark all that apply)
a. Data points with outliers
b. Data points with round shapes
c. Data points with non-convex shapes
d. Data points with different densities

`Answer :- a, c, d`

3. Which of the following clustering algorithms suffers from the problem of convergence at local optima? (Mark all that apply)
a. K- Means clustering algorithm
b. Agglomerative clustering algorithm
c. Expectation-Maximization clustering algorithm
d. Diverse clustering algorithm

`Answer :- a, c`

4.

`Answer :- b`

5. Assume, you want to cluster 7 observations into 3 clusters using K-Means clustering algorithm. After first iteration the clusters: C1, C2, C3 has the following observations:
C1: {1,1), (4,4), (7,7)}
C2: {(0,4), (4,0)}
С3: {(5,5), (9,9)}
What will be the cluster centroids if you want to proceed for second iteration?
a. C1: (4,4), C2: (2,2), C3: (7,7)
b. C1: (2,2), C2: (0,0), C3: (5,5)
c. C1: (6,6), C2: (4,4), C3: (9,9)
d. None of these

`Answer :- a`

6. Following Question 5, what will be the Manhattan distance for observation (9, 9) from cluster centroid C1 in the second iteration?
a. 10
b. 5
c. 6
d. 7

`Answer :- a`

7. Which of the following is not a clustering approach?
a. Hierarchical
b. Partitioning
c. Bagging
d. Density-Based

`Answer :- c`

8. Which one of the following is correct?
b. Single linkage clustering is computationally cheaper compared to K-means clustering.
c. K-Means clustering is computationally cheaper compared to single linkage clustering.
d. None of the above.

`Answer :- c`

9. Considering single-link and complete-link hierarchical clustering, is it possible for a point to be closer to points in other clusters than to points in its own cluster? If so, in which approach will this tend to be observed?
a. No

`Answer :- d`

10.

`Answer :- d`

11. Feature scaling is an important step before applying K-Mean algorithm. What is the reason behind this?
a. In distance calculation it will give the same weights for all features
b. You always get the same clusters if you use or don’t use feature scaling
c. In Manhattan distance it is an important step but in Euclidean it is not
d. None of these

`Answer :- a`

12. Which of the following options is a measure of internal evaluation of a clustering algorithm?
a. Rand Index
b. Jaccard Index
c. Davis-Bouldin Index
d. F-score

`Answer :- c`

13. Given, A= {0,1,2,5,6} and B = {0,2,3,4,5,7,9}, calculate Jaccard Index of these two sets.
a. 0.50
b. 0.25
c. 0.33
d. 0.41

`Answer :- c`

14. Suppose you run K-means clustering algorithm on a given dataset. What are the factors on which the final clusters depend?
I. The value of K
II. The initial cluster seeds chosen
III. The distance function used.
a. only
b. Il only
c. land Il only
d. I, Il and ill

`Answer :- d`

15. Consider a training dataset with two numerical features namely, height of a person and age of the person. The height varies from 4-8 and age varies from 1-100. We wish to perform K-Means clustering on the dataset. Which of the following options is correct?
a. We should use Feature-scaling for K-Means Algorithm.
b. Feature Scaling can not be used for KMeans Algorithm.
c. You always get the same clusters if you use or don’t use feature scaling.
d. None of these

`Answer :- a`

## NPTEL Introduction To Machine Learning – IITKGP Week 7 Assignment Answer 2023

1. Find the most specific concept using Find-S algorithm.

`Answer :- a`

2. Find the number of instances possible in X using the values that can be seen in the table in Q1.
a. 12
b. 48
c. 36
d. 24

`Answer :- b`

3. Find VC(H). [VC stands for Vapnik-Chervonenkis Dimension]
a. 2
b. 3
c. 5
d. 4

`Answer :- a`

4. Can VC dimension of H be 3?
a. Yes
b. No

`Answer :- b`

5. Let C be the classifier that returns a majority vote of the three classifiers. Assuming the errors of the ci are independent, what is the probability that C(x) will be correct on a new test example x?
a. 0.1815
b. 0.1215
c. 0.5505
d. 0.099

`Answer :- c`

6. Suppose you have run Adaboost on a training set for three boosting iterations. The results are classifiers h1, h2, and h3, with coefficients a1 = .2, a2 =-.3, and a3 = -.2. You find that the classifiers results on a test example x are h1(x) = 1, h2(x) = 1, and h3(x) = -1, What is the class returned by the Adaboost ensemble classifier H on test example x?
a. 1
b.-1

`Answer :- a`

7. Bagging is done to _______.
a. increase bias
b. decrease bias
c. increase variance
d. decrease variance

`Answer :- d`

8. Weak learners are the ones used as classifiers in Boosting algorithms. They are called weak learners because________
a. Error rate greater than 0.5
b. Error rate less than 0.5
c. No error

`Answer :- b`

9. Dropout is used as a regularization technique in Neural Networks where many different models are trained on different subsets of the data. In ensemble learning, dropout techniques would be similar to
a. Bagging
b. Boosting
c. None of the above

`Answer :- a`

10. Which of the following option is / are correct regarding the benefits of ensemble model?

1. Better performance
2. More generalized model
3. Better interpretability

a. 1 and 3
b. 2 and 3
c. 1 and 2
d. 1, 2 and 3

`Answer :- c`

11.

`Answer :- b, d`

12. The VC dimension of hypothesis space H1 is larger than the VC dimension of hypothesis space H2. Which of the following can be inferred from this?
a. The number of examples required for learning a hypothesis in H1 is larger than the number of examples required for H2.
b. The number of examples required for learning a hypothesis in H1 is smaller than the number of examples required for H2.
c. No relation to number of samples required for PAC learning.

`Answer :- a`

13. For a particular learning task, if the required error parameter € changes from 0.2 to 0.01, then how many more samples will be required for PAC learning?
a. Same
b. 2 times
c. 20 times
d. 200 times

`Answer :- c`

14. In boosting, which data points are assigned higher weights during the training of subsequent models?
a. Data points that are classified correctly by the previous models.
b. Data points that are misclassified by the previous models.
c. Data points that are randomly selected from the training data.
d. Data points that are ignored during training.

`Answer :- b`

15. In AdaBoost, how are the individual weak learners combined to form the final strong ensemble model’s prediction?
a. By taking the majority vote of all weak learners’ predictions.
b. By averaging the predictions of all weak learners.
c. By weighting the predictions of weak learners based on their accuracy.
d. By selecting the prediction of the weak learner with the highest accuracy.

`Answer :- c`

## NPTEL Introduction To Machine Learning – IITKGP Week 6 Assignment Answer 2023

Q1. Find the appropriate weights for w0, w1, and w2 to represent the AND function. Threshold function = (1, if output >0; O otherwise). x0 and x1 are the inputs and b1=1 is the bias.
a. w0=1. w1=1, w2=1
b. w0=1, w1=1, w2=-1
C. w0=-1, w1=-1, w2=-1
d. w0=2, w1=-2, w3=-1

`Answer:- b`

Q2. Fill in the correct weights to represent OR function:
a. w0=1, w1=1, w2=0
b. w0=1, w2=1, w3=1
c. w0=1, w1=1, w2=-1
d. wO=-1, w1=-1, w2=-1

`Answer:- a`

Q3. Which of the following gives non- linearity to a neural network
b. Bias
c. ReLU Activation Function
d. None

`Answer:- c`

Q4.

Suppose you are to design a system where you want to perform word prediction also known as language modeling. You are to take the output from the previous state and also the input at each step to predict the next word. The inputs at each step are the words for which the next
words are to be predicted. Which of the following neural network would you use?
a. Multi-Layer Perceptron
b. Recurrent Neural Network
c. Convolutional Neural Network
d. Perceptron

`Answer:- b`

Q5.

`Answer:- a`

Q6.

`Answer:- c`

Q7.

`Answer:- b`

Q8.

`Answer:- b`

Q9.

`Answer:- a, c`

Q10.

`Answer:- a`

Q11.

`Answer:- b`

Q12.

`Answer:- c`

Q13.

`Answer:- b`

Q14.

`Answer:- a`

Q15.

`Answer:- b`

## NPTEL Introduction To Machine Learning – IITKGP Week 5 Assignment Answer 2023

1. What would be the ideal complexity of the curve which can be used for separating the two classes shown in the image below?

a. Linear
c. Cubic
d. insufficient data to draw a conclusion

`Answer :- a`

2. Suppose you are using a Linear SVM classifier with 2 class classification problem. Now you have been given the following data in which some points are circled red that are representing support vectors.

If you remove the following any one red points from the data. Will the decision boundary change?

a. Yes
b. No

`Answer :- a`

3. What do you mean by a hard margin in SVM Classification?

a. The SVM allows very low error in classification
b. The SVM allows high amount of error in classification
c. Both are True
d. Both are False

`Answer :- a`

4.

`Answer :- b`

5. After training an SVM, we can discard all examples which are not support vectors and can still classify new examples?

a. True
b. False

`Answer :- a`

6. Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter.

What would happen when you use very large value of C (C->infinity)?

a. We can still classify data correctly for given setting of hyper parameter C.
b. We can not classify data correctly for given setting of hyper parameter C
c. None of the above

`Answer :- a`

7. Following Question 6, what would happen when you use very small C (C~0)?

a. Data will be correctly classified
b. Misclassification would happen
c. None of these

`Answer :- b`

8.

`Answer :- a`

9.

`Answer :- c`

10. What type of kernel function is commonly used for non-linear classification tasks in SVM?

a. Linear kernel
b. Polynomial kernel
c. Sigmoid kernel
d. Radial Basis Function (RBF) kernel

`Answer :- d`

11. Which of the following statements is/are true about kernel in SVM?

1. Kernel function map low dimensional data to high dimensional space
2. It’s a similarity function

a. 1 is True but 2 is False
b. 1 is False but 2 is True
c. Both are True
d. Both are False

`Answer :- c`

12. The soft-margin SVM is prefered over the hard-margin SVM when:

a. The data is linearly separable
b. The data is noisy
c. The data contains overlapping point

`Answer :- b, c`

13.

`Answer :- c`

14. What is the primary advantage of Kernel SVM compared to traditional SVM with a linear kernel?

a. Kernel SVM requires less computational resources.
b. Kernel SVM does not require tuning of hyperparameters.
c. Kernel SVM can capture complex non-linear relationships between data points.
d. Kernel SVM is more robust to noisy data.

`Answer :- c`

15. What is the sigmoid function’s role in logistic regression?

a. The sigmoid function transforms the input features to a higher-dimensional space.
b. The sigmoid function calculates the dot product of input features and weights.
c. The sigmoid function defines the learning rate for gradient descent.
d. The sigmoid function maps the linear combination of features to a probability value.

`Answer :- d`

## NPTEL Introduction To Machine Learning – IITKGP Week 4 Assignment Answer 2023

Questions 1-4 with the data provided below:
A spam filtering system has a probability of 0.95 to classify correctly a mail as spam and 0.10
probability of giving false positives. It is estimated that 0.5% of the mails are actual spam
mails.
Q1) Suppose that the system is now given a new mail to be classified as spam/ not-spam, what is the probability that the mail will be classified as spam?
a. 0.89575
b. 0.10425
c. 0.00475
d. 0.09950

`Answer:- b`

Q2. Find the probability that, given a mail classified as spam by the system, the mail actually being spam.
a. 0.04556
b. 0.95444
c. 0.00475
d. 0.99525

`Answer:- a`

Q3. Given that a mail is classified as not spam, the probability of the mail actually being not spam
a. 0.10425
b. 0.89575
c. 0.003
d. 0.997

`Answer:- d`

Q4. Find the probability that the mail is misclassified:
a. 0.90025
b. 0.09975
c. 0.8955
d. 0.1045

`Answer:- b`

Q5. What is the naive assumption in a Naive Bayes Classifier?
a. All the classes are independent of each other
b. All the features of a class are independent of each other
c. The most probable feature for a class is the most important feature to be considered for classification
d. All the features of a class are conditionally dependent on each other.

`Answer:- b`

Q6.

`Answer:- b`

Q7. Find P (K=0| a=1, b=1).
a. 1/3
b. 2/3
C. 1/9
d. 8/9

`Answer:- b`

Q8. What is the joint probability distribution in terms of conditional probabilities?
a. P(D1) * P(D2\D1) * P(S1|D1) * P(S2]D1) * P(S3|D2)
b. P(D1) * P(D2) * P(S1\D1) * P(S2]D1) * P(S3|D1, D2)
c. P(D1) * P(D2) * P(S1 D2) * P(S2]D2) * P(S3|D2)
d. P(D1) * P(D2) * P(S1|D1) * P(S2|D1, D2) * P(S3|D2)

`Answer:- d`

Q9. Suppose P(D1) = 0.4, P(D2) = 0.7 , P(SID1)=0.3 and P(S1| D1′)= 0.6. Find P(S1)
a. 0.12
b. 0.48
c. 0.36
d. 0.60

`Answer:- b`

Q10. What is the Markov blanket of variable, S3
a. D1
b. D2
c. D1 and D2
d. None

`Answer:- b`

Q11.

`Answer:- b`

Q12.

`Answer:- b`

Questions 13-14 with the data given below:
In an oral exam you have to solve exactly one problem, which might be one of three types, A. B, or C, which will come up with probabilities 30%, 20%, and 50%, respectively. During your preparation you have solved 9 of 10 problems of type A. 2 of 10 problems of type B, and 6 of 10 problems of type C.

13) What is the probability that you will solve the problem of the exam?
а. 0.61
b. 0.39
c. 0.50
d. 0.20

`Answer:- a`

Q14. Given you have solved the problem, what is the probability that it was of type A?
а. 0.35
b. 0.50
c. 0.56
d. 0.44

`Answer:- d`

Q15. Naive Bayes is a popular classification algorithm in machine learning. Which of the
following statements is/are true about Naive Bayes?
a. Naive Bayes assumes that all features are independent of each other, given the class.
b. It is particularly well-suited for text classification tasks, like spam detection.
c. Naive Bayes can handle missing values in the dataset without any special treatment.
d. It is a complex algorithm that requires a large amount of training data.

`Answer:- a, b, `

## NPTEL Introduction To Machine Learning – IITKGP Week 3 Assignment Answer 2023

Q1. Fill in the blanks:
K-Nearest Neighbor is a

a. Non-parametric, eager
b. Parametric, eager
c. Non-parametric, lazy
d. Parametric, lazy algorithm

`Answer :- c`

2. You have been given the following 2 statements. Find out which of these options is/are true in the case of k-NN.

(i) In case of very large value of k, we may include points from other classes into the neighborhood.
(ii) In case of too small value of k, the algorithm is very sensitive to noise.

a. (i) is True and (ii) is False
b. (i) is False and (ii) is True
c. Both are True
d. Both are False

`Answer :- c`

3. State whether the statement is True/False: k-NN algorithm does more computation on test time rather than train time.

a. True
b. False

`Answer :- a`

4. Suppose you are given the following images (1 represents the left image, 2 represents the middle and 3 represents the right). Now your task is to find out the value of k in k-NN in each of the images shown below. Here k1 is for 15, k2 is for 2nd and k3 is for 3rd figure.

a. k1 > k2> k3
b. k1 < k2> k3
c. k1 < k2 < k3
d. None of these

`Answer :- c`

5. Which of the following necessitates feature reduction in machine learning?

a. Irrelevant and redundant features
b. Limited training data
c. Limited computational resources.
d. All of the above

`Answer :- d`

6. Suppose, you have given the following data where x and y are the 2 input variables and Class is the dependent variable.

`Answer :- a`

7. What is the optimum number of principal components in the below figure?

a. 10
b. 20
c. 30
d. 40

`Answer :- c`

8. Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statements is correct? Choose which of the options is correct?

a. Higher value of ‘k’ means more regularization
b. Higher value of ‘K means less regularization

`Answer :- b`

9. In collaborative filtering-based recommendation, the items are recommended based on :

a. Similar users
b. Similar items
c. Both of the above
d. None of the above

`Answer :- a`

10. The major limitation of collaborative filtering is:

a. Cold start
b. Overspecialization
c. None of the above

`Answer :- a`

11. Consider the figures below. Which figure shows the most probable PC component directions for the data points?

`Answer :- a`

12. Suppose that you wish to reduce the number of dimensions of a given data to dimensions using PCA. Which of the following statement is correct?

a. Higher means more regularization
b. Higher means less regularization
c. Can’t Say

`Answer :- b`

13. Suppose you are given 7 plots 1-7 (left to right) and you want to compare Pearson correlation coefficients between variables of each plot. Which of the following is true?

`Answer :- b`

14. Imagine you are dealing with 20 class classification problem. What is the maximum number of discriminant vectors that can be produced by LDA?
a. 20
b. 19
c. 21
d. 10

`Answer :- b`

15. In which of the following situations collaborative filtering algorithm is appropriate?

a. You manage an online bookstore and you have the book ratings from many users. For each user, you want to recommend other books he/she will like based on her previous ratings and other users’ ratings.
b. You manage an online bookstore and you have the book ratings from many users. You want to predict the expected sales volume (No of books sold) as a function of average rating of a book.
c. Both A and B
d. None of the above

`Answer :- a`

## NPTEL Introduction To Machine Learning – IITKGP Week 2 Assignment Answer 2023

1. What is Entropy (Emotion Wig = Y)?

a. 1
b. 0
c. 0.50
d. 0.20

`Answer :- a`

2. What is Entropy (Emotion\ Ears = 3)?

a. 1
b. 0
c. 0.50
d. 0.20

`Answer :- b`

3. Which attribute should you choose as root of the decision tree?

a. Color
b. Wig
c. Number of ears
d. Any one of the previous three attributes

`Answer :- a`

4. In linear regression, the output is:

a. Discrete
b. Categorical
c. Continuous
d. May be discrete or continuous

`Answer :- c`

5. Consider applying linear regression with the hypothesis as he(x) = 0o + Ox. The training
data is given in the table.

where m is the number of traming examples. N(* is the value of linear regression hypothesis at point, i. If 0 = [1, 1]. find J

a. 0
b. 1
c. 2
d. 0.5

`Answer :- b`

6. Specify whether the following statement is true or false? “The ID3 algorithm is guaranteed to find the optimal decision tree”

a. True
b. False

`Answer :- b`

7. Identify whether the following statement is true or false? “A classifier trained on less training data is less likely to overfît”

a. True
b. False

`Answer :- b`

8. Identify whether the following statement is true or false? “Overfîtting is more likely when the hypothesis space is small”

a. True
b. False

`Answer :- b`

9. Traditionally, when we have a real-valued input attribute during decision-tree learning, we consider a binary split according to whether the attribute is above or below some threshold. One of your friends suggests that instead we should just have a multiway split with one branch for each of the distinet values of the attribute. From the list below choose the single biggest problem with your friend’s suggestion:

a. It is too computationally expensive
b. It would probably result in a decision tree that scores badly on the training set and a test set
c. It would probably result in a decision tree that scores well on the training set but badly on a test set
d. would probably result in a decision tree that scores well on a test set but badly on a tramning set

`Answer :- c`

10. Which of the following statements about decision trees is/are true?

a. Decision trees can handle both categorical and numerical data.
b. Decision trees are resistant to overfitting.
c. Decision trees are not interpretable.
d. Decision trees are only suitable for binary classification problems.

`Answer :- a`

11. Which of the following techniques can be used to handle overfitting in decision trees?

a. Pruning
b. Increasing the tree depth
c. Decreasing the minimum number of samples required to split a node
d. Adding more features to the dataset

`Answer :- a, c`

12. Which of the following is a measure used for selecting the best split in decision trees?

a. Gini Index
b. Support Vector Machine
c. K-Means Clustering
d. Naive Bayes

`Answer :- a`

13. What is the purpose of the decision tree’s root node in machine learning?

a. It represents the class labels of the training data.
b. It serves as the starting point for tree traversal during prediction.
c. It contains the feature values of the training data.
d. It determines the stopping criterion for tree construction.

`Answer :- b`

14. Which of the following statements about linear regression is true?

a. Linear regression is a supervised learning algorithm used for both regression and classification tasks.
b. Linear regression assumes a linear relationship between the independent and dependent variables.
c. Linear regression is not affected by outliers in the data.
d. Linear regression can handle missing values in the dataset.

`Answer :- b`

15. Which of the following techniques can be used to mitigate overfitting in machine learning?

a. Regularization
b. Increasing the model complexity
c. Gathering more training data
d. Feature selection or dimensionality reduction

`Answer :- a, c. d`

## NPTEL Introduction To Machine Learning – IITKGP Week 1 Assignment Answer 2023

1. Which of the following is/are classification tasks?

a. Find the gender of a person by analyzing his writing style
b. Predict the price of a house based on floor area. number of rooms. etc.
C. Predict whether there will be abnormally heavy rainfall next year
d. Predict the number of conies of a book that will be sold this month

```Answer :- a. Find the gender of a person by analyzing his writing style
c. Predict whether there will be abnormally heavy rainfall next year

Explanation:
a. Finding the gender of a person based on writing style involves classifying the person into one of two classes - male or female.

c. Predicting whether there will be abnormally heavy rainfall next year involves classifying the occurrence of heavy rainfall as either "abnormally heavy rainfall" or "not abnormally heavy rainfall." This can be treated as a binary classification problem.```

2. A feature F1 can take certain values: A, B, C, D, E, F, and represents the grade of students from a college. Which of the following statement is true in the following case?

a. Feature F1 is an example of a nominal variable.
b. Feature F1 is an example of an ordinal variable.
c. It doesn’t belong to any of the above categories.
d. Both of these

```Answer :- b. Feature F1 is an example of an ordinal variable.

Explanation:
In statistics, variables can be classified into different types, and two common types are nominal and ordinal variables.

Nominal variables are categorical variables with no inherent order. The categories in a nominal variable cannot be ranked or ordered in any meaningful way. Examples of nominal variables are eye color, country names, or the types of fruits.

Ordinal variables, on the other hand, have categories with a natural order or ranking. While the exact differences between the categories may not be well-defined, there is a relative ordering among them. Examples of ordinal variables are educational levels (e.g., high school, undergraduate, graduate) or ratings like "good," "better," and "best."

In this case, the feature F1 represents the grades of students from a college, and these grades likely have an inherent order such as A being better than B, and so on. Therefore, F1 is an example of an ordinal variable.```

3. Suppose I have 10,000 emails in my mailbox out of which 200 are spams. The spam detection system detects 150 emails as spams, out of which 50 are actually spam. What is the precision and recall of my spam detection system?

a. Precision = 33.333%. Recall = 25%
b. Precision = 25%, Recall = 33.33%
c. Precision = 33.33%, Recall = 75%
d. Precision = 75%, Recall = 33.33%

`Answer :- a. Precision = 33.33%. Recall = 25%`

4. Which of the following statements describes what is most likely TRUE when the amount of training data increases?

a. Training error usually decreases and generalization error usually increases.
b. Training error usually decreases and generalization error usually decreases.
C. Training error usually increases and generalization error usually decreases.
d. Training error usually increases and generalization error usually increases.

`Answer :- a`

5. You trained a leaming algorithm, and plot the learning curve. The following figure is obtained.

The algorithm is suffering from

a. High bias
b. High variance
c. Neither

`Answer :- a`

6. I am the marketing consultant of a leading e-commerce website. I have been given a task of making a system that recommends products to users based on their activity on Facebook. I realize that user interests could be highly variable. Hence, I decide to
T1) Cluster the users into communities of like-minded people and
T2) Train separate models for each community to predict which product category (e.g., electronic gadgets. cosmetics. etc.) would be the most relevant to that community.

The task T1 is a/an _________ learning problem and T2 is a/an ________problem.

Choose from the options:

a. Supervised and unsupervised
b. Unsupervised and supervised
c. Supervised and supervised
d. Unsupervised and unsupervised learning problem and I2 is a/an

`Answer :- b`

7. Select the correct equations.
TP – True Positive, IN – True Negative, FP – False Positive, FN – False Negative
i. Precision = Tp/Tp+Fp
ii Recall = FP/Ty+Fp
ili. Recall = Tp/To+Fn
iv. Accuracy=: Tp+Fn/Tp+Fp+Tn+Fn

a. i, iii. IV
b. i and iii
c. 11 and iv
d. i. ii, iii. iv

`Answer :- a. i, iii. IV `

8. Which of the following tasks is NOT a suitable machine learning task(s)?

a. Finding the shortest path between a pair of nodes in a graph
b. Predicting if a stock price will rise or fall
c. Predicting the price of petroleum
d. Grouping mails as spams or non-spams

```Answer :- a. Finding the shortest path between a pair of nodes in a graph.

Explanation:
Machine learning is not typically used for finding the shortest path between nodes in a graph. This problem can be efficiently solved using algorithms like Dijkstra's algorithm or A* search algorithm, which are specifically designed for this purpose and do not involve learning from data.```

9. Which of the following is/are associated with overfitting in machine learning?

a. High bias
b. Low bias
c. Low variance
d. High variance
e. Good performance on training data
f. Poor performance on test data

```Answer :- b. Low bias
d. High variance
e. Good performance on training data
f. Poor performance on test data

Explanation:

High variance (option d) refers to a model that is too complex and captures noise or random fluctuations in the training data. As a result, it performs well on the training data but poorly on unseen test data, indicating overfitting.
Good performance on training data (option e) is a common characteristic of overfitting. The overfitted model fits the training data closely, leading to high accuracy or low error on the training set.
Poor performance on test data (option f) is another sign of overfitting. The overfitted model does not generalize well to new, unseen data, resulting in lower accuracy or higher error on the test set compared to the training set.```

10. Which of the following statements about cross-validation in machine learning is/are true?

a. Cross-validation is used to evaluate a model’s performance on the training data.
b. Cross-validation guarantees that a model will generalize well to unseen data.
c. Cross-validation is only applicable to classification problems and not regression problems.
d. Cross-validation helps in estimating the model’s performance on unseen data by simulating the test phase.

`Answer :- d`

11. What does k-fold cross-validation involve in machine learning?

a. Splitting the dataset into k equal-sized training and test sets.
b. Splitting the dataset into k unequal-sized training and test sets.
c. Partitioning the dataset into k subsets, and iteratively using each subset as a validation set while the remaining k-1 subsets are used for training.
d. Dividing the dataset into k subsets, where each subset represents a unique class label for classification tasks.

`Answer :- c`

12. What does the term “feature space” refer to in machine learning?

a. The space where the machine learning model is trained.
b. The space where the machine learning model is deployed.
c. The space which is formed by the input variables used in a machine leaming model.
d. The space where the output predictions are made by a machine learning model.

```Answer :- c. The space which is formed by the input variables used in a machine learning model.

Explanation:
In machine learning, the term "feature space" refers to the space formed by the input variables (features) used to train a machine learning model. Each data point in the dataset represents a point in this feature space, where the coordinates are the values of the input features.

For example, if you have a dataset with two features, such as "age" and "income," then the feature space would be a two-dimensional space where each data point is represented by a pair of values (age, income).

Machine learning algorithms work by trying to find patterns and relationships in this feature space that can be used to make predictions or classifications. The goal is to find a decision boundary or decision surface that separates different classes or groups based on their feature values.```

13. Which of the following statements is/are true regarding supervised and unsupervised learning?

a. Supervised learning can handle both labeled and unlabeled data.
b. Unsupervised learning requires human experts to label the data.
c. Supervised learning can be used for regression and classification tasks.
d. Unsupervised learning aims to find hidden patterns in the data.

```Answer :- c. Supervised learning can be used for regression and classification tasks.
d. Unsupervised learning aims to find hidden patterns in the data.```

14. One of the ways to mitigate overfitting is

a. By increasing the model complexity
b. By reducing the amount of training data
c. By adding more features to the model
d. By decreasing the model complexity

```Answer :- d. By decreasing the model complexity
```

15. How many Boolean functions are possible with N features?

a. (22N)
b. (2N)
C. (N2)
d. (4N)

`Answer :- a (22N) (Not Sure)`

### 1 thought on “[Week 1 to 8] NPTEL Introduction To Machine Learning – IITKGP Assignment Answers 2023”

1. sunil

give answer sheet of every week

Scroll to Top