AWS DATA ENGINEERING
1. Introduction to AWS
Fundamentals of AWS Services
Fundamentals of IAM
AWS Web console
Setting up AWS CLI
2. Big data ecosystem on AWS:
Data Collection systems
Durability and availability of the data collection approach
optimize the operational characteristics of the storage solution
data access and retrieval patterns
appropriate data structure and storage format
3. S3 and Data lake
Amazon S3 Standard for frequent data access
Amazon S3 Standard for infrequent data access
Storage classes
S3 data stored geographically
S3 Cross-region Replication
S3 Lifecycle rules
S3 pricing concerns
S3 Security
Hosting a Static Website on Amazon S3
S3 Access Control
IAM Roles for S3 Access
Creating bucket policies
S3 Data Management
S3 Analytics
4. AWS Databases: RDS and DynamoDB
AWS Database services
Launching an RDS Aurora instance
Connecting to RDS Aurora
Backup and restore
Securing RDS
High-availability using Multi-AZ
Read replicas
Working with DynamoDB
Features and usecases of DynamoDB
5. Datawarehouse: Amazon Redshift
Redshift Architecture
Redshift Security
Loading the data
Query Design
Table design
Best practices
Workload Management
Restore and Snapshot
Overview of Redshift Spectrum
6. ETL with AWS Glue
IAM Permissions for AWS Glue
DNS in Your VPC
Environment to Access Data Stores
Environment for Development Endpoints
AWS Glue Console Workflow Overview
Populating AWS Glue Catalog
Authorizing Jobs
Running and Monitoring
ETL Programming
AWS Glue API
7. Data Processing with EMR and Kinesis
AWS Big data Ecosystem
EMR introduction
EMR Architecture
EMR Operation
Spark on EMR
Hadoop on EMR
Kinesis streams
Kinesis agent
Kinesis Producers and consumers
8. Querying with Amazon Athena
AWS Athena Overview and Setup
Data Catalogues
Integration with AWS Glue
Comments