PySpark MCQ Solution - Part 5

PySpark MCQ Solution Part 5

PySpark MCQ Solution Part 5

1.You want to save an RDD in a simple format consisting of pickled Python objects. Which of the following supports this:

  1. RDD.saveAsPickleFile
  2. SparkContext.pickleFile

a.1
b.2
c.Both of these (correct)
d.None of these


2.Which of these are examples of basic streaming sources provided for Spark streaming?

  1. File systems
  2. Socket connections
  3. Utility Classes

a.1 and 3
b.2 and 3
c.1 and 2 (correct)
d.All of these



3.Read the given statements carefully and choose the correct option.

  1. The number of cores allocated to the Spark Streaming application must be more than the number of receivers.
  2. When running a Spark Streaming program locally, local as the master URL.

a.1 (correct)
b.2
c.Both of these
d.None of these


4.Assume that you have set the enforceSchema parameter of the csv function in pyspark.sql.DataFrameReader(spark) interface.

To which of these is the specified schema forcibly applied to?
a.Datasource files
b.Headers in CSV files
c.First header in RDD
d.Both 1 and 2 (correct)


5.In PySpark, you are working on the PySpark API. If you are required to connect to a Spark Cluster, then which of the following conditions must be met in this scenario:

  1. Handle authentication
  2. Information specific to your cluster
  3. Information specific to all clusters

a.1
b.2
c.3
d.1 and 2 (correct)


6.What is the default smoothing parameter used for training a Naive Bayes model given an RDD of (label, features) vectors while working with classification.NaiveBayes module in PySparks 2.4.4?

a.0.01
b.1 (correct)
c.0.05
d.2


7.Which of the following is one common way to create PySpark RDDs?

a.Using the PySpark parallelize() function (correct)
b.Using the PySpark filter() function
c.Using the PySpark internalize() function
d.Using the PySpark parallel() function


8.Which of the following statement justifies that PySpark is based on the functional paradigm:

  1. Scala is functional-based.
  2. Functional code is much easier to parallelize
  3. Makes it very readable

a.1
b.1 and 2 (correct)
c.1 and 3
d.2 and 3


9.Which of the following PySpark interfaces provides a variety of ways to submit PySpark programs including PySpark Shell and the spark-submit command?

a.Command Line Interface (correct)
b.PySpark Shell Interface
c.Cluster Interface
d.Spark Interface


10.PySpark RDD performs two primary types of operations. What are the names of these two operations?

a.1. Transformations 2. Actions (correct)
b.1. Transactions 2. Actions
c.1. Transactions 2. Transformations
d.1. Streaming 2. Actions

Post a Comment

Post a Comment (0)

Previous Post Next Post