top of page

Gradient Descent Practice questions.

📝 Gradient Descent Quiz (20 Questions)

Multiple Choice Questions (MCQs).

Topic Covered: Gradient Descent,Types of Gradient Descent, Advantage and Disadvantage of there types.



Gradient Descent is primarily used for:


(a) Classification


(b) Optimization


(c) Data Cleaning


(d) Visualization

Answer: (b) Optimization


Which of the following update rules corresponds to Batch Gradient Descent?


(a) θ = θ − η ∇J(θ; x(i), y(i))


(b) θ = θ − η ∇J(θ)


(c) θ = θ − η/m Σ ∇J(θ; x(i), y(i))


(d) None of these

Answer: (b) θ = θ − η ∇J(θ)


In Stochastic Gradient Descent (SGD), parameter updates are done using:


(a) Entire dataset


(b) Mini-batch of data


(c) Single training example


(d) No data

Answer: (c) Single training example


Which type of Gradient Descent is most memory intensive?


(a) Batch GD


(b) Stochastic GD


(c) Mini-Batch GD


(d) All are equal

Answer: (a) Batch GD


Which Gradient Descent is most suitable for online learning?


(a) Batch GD


(b) Mini-Batch GD


(c) Stochastic GD


(d) None

Answer: (c) Stochastic GD


What is the role of the learning rate (η) in Gradient Descent?


(a) Controls step size of updates


(b) Determines model accuracy


(c) Chooses loss function


(d) Increases dataset size

Answer: (a) Controls step size of updates


A very high learning rate can cause:


(a) Faster convergence


(b) Overshooting the minimum


(c) Perfect optimization


(d) Slow convergence

Answer: (b) Overshooting the minimum


Which type of Gradient Descent achieves a balance between SGD and Batch GD?


(a) Mini-Batch GD


(b) Stochastic GD


(c) Online GD


(d) Adaptive GD

Answer: (a) Mini-Batch GD


The gradient ∇J(θ) represents:


(a) Loss value


(b) Slope of the loss function with respect to parameters


(c) Accuracy


(d) Prediction error only

Answer: (b) Slope of the loss function with respect to parameters


Which is a major advantage of Batch GD?


(a) Faster convergence


(b) Stable updates


(c) Handles large datasets well


(d) Very low memory need

Answer: (b) Stable updates


One-Word / Short Answer Questions

Fill in the blank: Gradient Descent minimizes the __________ function.

Answer: Loss (or Error) function


In SGD, updates are __________ compared to Batch GD.

Answer: Noisy / Frequent


The symbol θ usually represents __________ in Gradient Descent.

Answer: Model parameters (weights, biases)


Which Gradient Descent requires storing the entire dataset in memory?

Answer: Batch Gradient Descent


Give one use case of Batch GD.

Answer: Smaller datasets / Convex optimization problems


Which Gradient Descent is often used when data comes in a streaming form?

Answer: Stochastic Gradient Descent


Mini-batch size is denoted by the letter __________.

Answer: m


Too small mini-batch size can lead to __________ performance.

Answer: Suboptimal


The step taken in each iteration is determined by the __________.

Answer: Learning rate


Name two hyperparameters that need careful tuning in Gradient Descent.

Answer: Learning rate, Mini-batch size


✨ This set should give you enough practice to test concepts + formulas + pros & cons of each type of Gradient Descent.

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page