Description
PySpark SQL Recipes, 1st ed.
With HiveQL, Dataframe and Graphframes
Authors: Mishra Raju Kumar, Raman Sundar Rajan
Language: EnglishSubjects for PySpark SQL Recipes:
Keywords
PySpark; PySpark SQL; NO SQL; Graph frames; Data Processing; Spark Streaming; Big Data; Python
Approximative price 47.46 €
In Print (Delivery period: 15 days).
Add to cart the book of Mishra Raju Kumar, Raman Sundar Rajan323 p. · 15.5x23.5 cm · Paperback
Description
/li>Contents
/li>Biography
/li>Comment
/li>
- Understand PySpark SQL and its advanced features
- Use SQL and HiveQL with PySpark SQL
- Work with structured streaming
- Optimize PySpark SQL
- Master graphframes and graph processing
Chapter Goal: Reader will understand about PySpark, PySparkSQL , Catalyst Optimizer, Project Tungsten and Hive
No of pages 20-30
Sub -Topics
1. PySpark
2. PySparkSQL3. Hive
4. Catalyst5. Project Tungsten
Chapter 2: Some time with Installation
Chapter Goal: Learner will understand about installation of Spark, Hive, PostgreSQL, MySQL, MongoDB, Cassandra etc.
No of pages: 30 -40
Sub - Topics
1. Installation Spark2. Installation Hive
3. Installation MySQL
4. Installation MongoDB
Chapter 3: IO in PySparkSQLChapter Goal: This chapter will provide recipes to the reader, which will enable them to create PySparkSQL DataFrame from different sources.
No of pages : 40-50
Sub - Topics:
1. Creating DataFrame from data.
2. Reading csv file to create Dataframe3. Reading JSON file to create Dataframe.
4. Saving DataFrames to different formats.
Chapter 4 : Operations on PySparkSQL DataFrames
Chapter Goal: Reader will learn about data filtering, data manuipulation, data descriptive analysis , Dealing with missing value etc
No Of Pages ; 40 -50
1. Data filtering
2. Data manipulation
3. Row and column manipulation
Chapter 5 : Data Merging and Data Aggregation using PySparkSQL
Chapter Goal: Reader will learn about data merging and aggregation using PySparkSQL
1. Data Merging
2. Data aggregation
Chapter 6: SQL, NoSQL and PySparkSQL
Chapter Goal: Reader will learn to run SQL and HiveQL queries on Dataframe
No of pages: 30-40
Sub - Topics:
1. Running SQL on DataFrame
2. Running HiveQLChapter 7: Structured Streaming
Chapter Goal: Reader will understand about structured streaming
No of pages : 30-40
1. Different type of modes.
2. Data aggregation in structured streaming3. Different type of sources
Chapter 8 : Optimizing PySparkSQL
Chapter Goal: Reader will learn about optimizing PySparkSQL
No Of pages : 20-30
Optimizing PySparkSQL
Chapter 9 : GraphFrames
Chapter Goal: Reader will understand about graph data analysis with Graphframes.
No of pages : 30-401. GraphFrame Creation
1. Page Rank
2. Breadth First Search
Explains PySpark SQL and Dataframe in detail
Include IO operation using PySpark SQL from most frequently used SQL and NoSQL databases
Detail discussion on Data Preprocessing using PySpark SQL
Problem Solution approach to graph bases algorithm using Graphframes