Gradient Descent Practice questions.
- Computer Science
- Sep 24
- 2 min read
📝 Gradient Descent Quiz (20 Questions)
Multiple Choice Questions (MCQs).
Topic Covered: Gradient Descent,Types of Gradient Descent, Advantage and Disadvantage of there types.
Gradient Descent is primarily used for:
(a) Classification
(b) Optimization
(c) Data Cleaning
(d) Visualization
Answer: (b) Optimization
Which of the following update rules corresponds to Batch Gradient Descent?
(a) θ = θ − η ∇J(θ; x(i), y(i))
(b) θ = θ − η ∇J(θ)
(c) θ = θ − η/m Σ ∇J(θ; x(i), y(i))
(d) None of these
Answer: (b) θ = θ − η ∇J(θ)
In Stochastic Gradient Descent (SGD), parameter updates are done using:
(a) Entire dataset
(b) Mini-batch of data
(c) Single training example
(d) No data
Answer: (c) Single training example
Which type of Gradient Descent is most memory intensive?
(a) Batch GD
(b) Stochastic GD
(c) Mini-Batch GD
(d) All are equal
Answer: (a) Batch GD
Which Gradient Descent is most suitable for online learning?
(a) Batch GD
(b) Mini-Batch GD
(c) Stochastic GD
(d) None
Answer: (c) Stochastic GD
What is the role of the learning rate (η) in Gradient Descent?
(a) Controls step size of updates
(b) Determines model accuracy
(c) Chooses loss function
(d) Increases dataset size
Answer: (a) Controls step size of updates
A very high learning rate can cause:
(a) Faster convergence
(b) Overshooting the minimum
(c) Perfect optimization
(d) Slow convergence
Answer: (b) Overshooting the minimum
Which type of Gradient Descent achieves a balance between SGD and Batch GD?
(a) Mini-Batch GD
(b) Stochastic GD
(c) Online GD
(d) Adaptive GD
Answer: (a) Mini-Batch GD
The gradient ∇J(θ) represents:
(a) Loss value
(b) Slope of the loss function with respect to parameters
(c) Accuracy
(d) Prediction error only
Answer: (b) Slope of the loss function with respect to parameters
Which is a major advantage of Batch GD?
(a) Faster convergence
(b) Stable updates
(c) Handles large datasets well
(d) Very low memory need
Answer: (b) Stable updates
One-Word / Short Answer Questions
Fill in the blank: Gradient Descent minimizes the __________ function.
Answer: Loss (or Error) function
In SGD, updates are __________ compared to Batch GD.
Answer: Noisy / Frequent
The symbol θ usually represents __________ in Gradient Descent.
Answer: Model parameters (weights, biases)
Which Gradient Descent requires storing the entire dataset in memory?
Answer: Batch Gradient Descent
Give one use case of Batch GD.
Answer: Smaller datasets / Convex optimization problems
Which Gradient Descent is often used when data comes in a streaming form?
Answer: Stochastic Gradient Descent
Mini-batch size is denoted by the letter __________.
Answer: m
Too small mini-batch size can lead to __________ performance.
Answer: Suboptimal
The step taken in each iteration is determined by the __________.
Answer: Learning rate
Name two hyperparameters that need careful tuning in Gradient Descent.
Answer: Learning rate, Mini-batch size
✨ This set should give you enough practice to test concepts + formulas + pros & cons of each type of Gradient Descent.
Comments