PySpark – collect()

In Python, PySpark is a Spark module used to provide a similar kind of Processing like spark using DataFrame.

collect() method in PySpark displays the data present in dataframe row by row.

Syntax:

Dataframe.collect()

Example:

In this example, we will create a PySpark DataFrame with 6 columns and 5 rows and display the dataframe in a tabular format using the show() method.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
{'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
{'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
{'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
{'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# dataframe
df.show()

Output:

Example 2:

Let’s display the dataframe using the collect() method

Output:

[Row(address='guntur', age=23, height=5.79, name='sravan', rollno='001', weight=67),

Row(address='hyd', age=16, height=3.79, name='ojaswi', rollno='002', weight=34),

Row(address='patna', age=7, height=2.79, name='gnanesh chowdary', rollno='003', weight=17),

Row(address='hyd', age=9, height=3.69, name='rohith', rollno='004', weight=28),

Row(address='hyd', age=37, height=5.59, name='sridevi', rollno='005', weight=54)]

We can also use for loop with collect() method to iterate row by row

Syntax:

for iterator in dataframe.collect():

print(iterator)

To display particular columns, we have to specify the column name with iterator using [] operator

Syntax:

for iterator in dataframe.collect():

print(iterator[‘column1’], iterator[‘column2’],…….)

Example:

This example will iterate multiple columns in a dataframe with the collect() method.

Output:

sravan

ojaswi

gnanesh chowdary

rohith

sridevi

------------------------------------

sravan 67

ojaswi 34

gnanesh chowdary 17

rohith 28

sridevi 54

------------------------------------

Row(address='guntur', age=23, height=5.79, name='sravan', rollno='001', weight=67)

Row(address='hyd', age=16, height=3.79, name='ojaswi', rollno='002', weight=34)

Row(address='patna', age=7, height=2.79, name='gnanesh chowdary', rollno='003', weight=17)

Row(address='hyd', age=9, height=3.69, name='rohith', rollno='004', weight=28)

Row(address='hyd', age=37, height=5.59, name='sridevi', rollno='005', weight=54)

We can also get the particular Row using the collect() method using an index.

In PySpark DataFrame, indexing starts from 0.

Syntax:

Dataframe.collect()[row_index]

Example:

In this example, we are collecting first, second and fifth rows.

Output:

Row(address='guntur', age=23, height=5.79, name='sravan', rollno='001', weight=67)

Row(address='hyd', age=16, height=3.79, name='ojaswi', rollno='002', weight=34)

Row(address='hyd', age=37, height=5.59, name='sridevi', rollno='005', weight=54)

We can also get the particular column in a Row using the collect() method using an index.

In PySpark DataFrame, indexing starts from 0.

Syntax:

Dataframe.collect()[row_index][column_index]

Example:

In this example, we are going to get values from the first row – first column, third row – first column

Output:

guntur

patna

Conclusion

In this article, we covered the usage of the collect() method with different scenarios. Finally, we understand that the collect() method is used to get the data Row by Row in the PySpark DataFrame.

from https://ift.tt/3r4Ot5c

PySpark – collect()

Example:

Example 2:

Example:

Example:

Example:

Conclusion

Post a Comment

0 Comments

Popular Posts

Best Webcam Speaker Microphone Combo

Compared: Raspberry Pi OS vs. Armbian vs. Debian GNU/Linux

RK3399K based module and SBC can operate at -20 to 80℃

Subscribe Us

Menu Footer Widget

PySpark – collect()

Example:

Example 2:

Example:

Example:

Example:

Conclusion

You may like these posts

Post a Comment

0 Comments

Social Plugin

Popular Posts

Best Webcam Speaker Microphone Combo

Compared: Raspberry Pi OS vs. Armbian vs. Debian GNU/Linux

RK3399K based module and SBC can operate at -20 to 80℃

Subscribe Us

Menu Footer Widget