Data and Microsoft Azure

The Experts: Jason Howell,
Azim Uddin, Lisa Liu, Adam Saxton,
Rohit Nayak
SQLintersection
Tuesday, 3:45-5:00pm
Do you know Data in the Microsoft Cloud?
Upgrade Your Life
Bob Ward, CTO CSS, Microsoft
[email protected]
http://aka.ms/bobwardms
What is Microsoft Azure?
Microsoft Azure is a growing collection of integrated cloud services—
analytics, computing, database, mobile, networking, storage, and web—
for moving faster, achieving more, and saving money…..
Use an open
and flexible
platform
Extend your
existing IT
Run your
apps
anywhere
Scale as you
need, pay as
you go
Make
smarter
decisions
Protect your
data
Rely on a
trusted
cloud
The Cloud for Modern Business
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
The “No Excuse” Cloud Checklist
Azure Trust
Center
“I’m concerned about cloud
security and privacy”
• Internal security measures, isolation and control, and Compliance
“The Cloud doesn’t give me
the control I need”
• Azure Virtual Machines allow you complete control of your SQL
Server instance and configuration
“I can’t rely on the speed of
the public Internet”
• Azure ExpressRoute to get WAN network speeds up to 10Gbps
• Data centers in 20 regions around the world
“The cloud doesn’t offer the • Azure SQL Database now contains features such as full-text search,
TDE, and row-level security
features I need”
“I can’t move everything to
the cloud”
• Hybrid is one of the key diffenteriators for Azure including Azure
Backup, SQL Stretch Database, and secondary replicas in Azure VM
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Data and Microsoft Azure
SQL in Azure
Virtual
Machine
Azure SQL
Database
Azure SQL
Data
Warehouse
Preview
Data
Azure
DocumentDB
Azure
HDInsight
Azure Data
Lake Preview
coming
Azure Data
Factory
Azure
Machine
Learning
Power BI
Analytics
Stream Analytics, Search, and Data Catalog Preview
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
SQL Server in Azure Virtual Machine
Complete SQL Server “box” running in a VM
The VM is hosted in a Microsoft data center with a variety of sizes
SQL Server license is “subscription based” or “bring your own”
VM usage is “pay as you use it”
Use the Marketplace to avoid the pain of setup and configuration
Hybrid scenarios available and welcome
You
Microsoft
Infrastructure as
a Service (IAAS)
VM
DB
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azure SQL Database








Deploy a database and application. A managed service
Built-in High Availability, Geo-Replication, Restore, and Auditing
Basic, Standard, and Premium Service Tiers (determine features, price, and performance)
Predictable Performance via DTU that can be dynamically adjusted
Close to box parity and new features are here first (row level security and full-text)
Elastic pools to manage large number of databases
Elastic scale-out for distributed performance
Stretch Database in SQL Server 2016 for historical archive scenarios
You
Microsoft
Platform as
a Service (PAAS)
VM
DB
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azure SQL Database is Ready
Premium now supports
1TB database
Point in Time Restore, GeoRestore, Geo and Active
Geo-Replication
Application isolation and
predictable performance
with Service Tiers
CLR, XML, Change
Tracking, and Full-Text
Search now supported
Azure Active Directory
support as alternative for
SQL authentication
Columnstore index
support
Security features such as
row-level security,
auditing, TDE
Monitor and troubleshoot
with Query Store, Index
Advisor, and Extended
Events
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Demo
Let’s explore Azure SQL Database
Azure SQL Data Warehouse Preview
Think APS and SQL
Server together in
the cloud
Fully managed
(PAAS) like Azure
SQL Database
Full Polybase
support for queries
that span SQL and
Hadoop
environments
Petabytes storage of
data along with MPP
for scale-out queries
Power BI direct
connectivity
Separate usage and
costs for compute
(DWU) and storage
Solution Partners for
ETL solutions
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azure DocumentDB
Try Query Playground
A fully-managed NoSQL document database service

Why NoSQL document?



Schema-free data storage (no tables and columns predefined)
JSON document storage (preferred by many web developers)
Why database?






Don’t think
Word document
A JSON
Supports transaction processing
DBMS
Supports database consistency
Automatically indexes JSON documents
Engine programming via procs, triggers, and UDF written in JavaScript
A SQL language query interface built-in
Who would use this?


JSON has become a popular choice for schema-free, unstructured data
Natural choice for any developer working with JSON formatted data
JSON support In SQL 2016
{
"id": “1",
“Team": “Dallas Cowboys",
“Players": [
{
“Name": “Tony Romo“,
“Jersey”: 9,
“Position” : “QB”
},
{
“Name": “Dez Bryant“,
“Jersey”: 88,
“Position” : “WR”
},
],
“Conference": “NFC“,
“Prediction”: 1
}
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azure HDInsight
We simplify the deployment
of Hadoop clusters
Hortonworks Data Platform (HDP)
A semi-managed Hadoop based data and analytics service
•
•
•
•
•
Distributed processing of large data (Big) across a cluster of computers
Support both Windows and Linux environments all running in Azure
Storage based on HDFS that is “schema on read”
Job scheduling and resource management through YARN
Parallel processing of data through MapReduce (think Batch processing)
Ambari
Batch –
MapReduce
Mahout
Script – Pig
Oozie
Phoenix
SQL – Hive
Sqoop
Tez
NoSQL – Hbase
Zookeeper
Streaming –
Storm
In-Memory Spark
Core Engine – HDFS and YARN
Azure Storage Blob (WASB)
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Think “batch jobs”
MapReduce Explained
Map
“Show me the count of ERRORLOG entries for spid3s”
Take <key,value> input and “map”
this into an output.
Imagine multiple parallel “map”
tasks doing this work.
ERRORLOG.1
ERRORLOG.100
ERRORLOG.1000
Reduce
Take the output of the map tasks
and combine or process them into
the final output
Your schema is in
your code on “read”
Spid3s:5
Spid3s:10
Spid3s:2
17
HIVE provides a SQL “like” experience
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azim Uddin
Big Data Support Blog
Demo
Using HDInsight HIVE to analyze
SQL ERRORLOG files
Azure Data Lake Preview Coming
Jobs instead of clusters
Used by many services
A fully managed store and analytics
service
Built on Azure Blob Storage
Based on YARN and HDFS
Rich Visual Studio Development
Environment
U-SQL makes it easy for the SQL
Professional
@t = EXTRACT date string
, time string
, author string
, tweet string
FROM "/input/MyTwitterHistory.csv"
USING Extractors.Csv();
@res = SELECT author
, COUNT(*) AS tweetcount
FROM @t
GROUP BY author;
OUTPUT @res TO
"/output/MyTwitterAnalysis.csv"
ORDER BY tweetcount DESC
USING Outputters.Csv();
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Azure Data Factory
 Think SSIS and SQLAgent capabilities as a managed cloud service



Linked Services – Data stores or compute service
Defined with JSON
Datasets – Input Data Source or Output dest
Pipelines – Orchestrated activities using datasets and linked services
 Develop in Azure Portal, PowerShell, or Visual Studio
 Schedule and monitor your pipelines
HIVE Query
Activity
Copy Activity
Convert
ERRORLOG
files to
UTF-8
Linked Service:
onPremisesFileServer
DataSet: FileShare
Copy to
Azure
Storage
Execute
HIVE DDL
Power BI Dashboard
Direct Connect
Copy Activity
Execute
HIVE
Queries
Copy
results into
Azure
Database
Linked Service:
AzureStorage
Linked Service:
HDInsightOnDemand
Linked Service:
AzureSQLDatabase
DataSet: AzureBlob
DataSet: AzureBlob
DataSet: AzureSQLTable
OnDemand – Data Factory creates and
deletes HDInsight clusters as needed
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Try it free
Azure Machine Learning
“Learning is any process by which a system improves performance from experience”
– Herbert Simon




Azure Machine Learning = cloud computing platform to build, test, deploy, and publish predictive models
experiments
A predictive model contains…

Define your problem (“I want to predict the winner of Super Bowl 50”)

Ingest, clean, and aggregate data from past experiences and current data

Build a model using “learning tasks”, algorithms, modules, and a “flow”

Deploy and run your model
Machine Learning Studio allows you to build and deploy prediction models

Reuse libraries of algorithms and modules

Enhance with custom R and Python scripts

Publish your work as a web service to be consumed in the Marketplace

Use Cortana Analytics Gallery to get jump started
T-SQL R language integration coming to SQL Server 2016
Azure ML Cheat Sheet
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Free or Power BI Pro
Power BI
A true self-service BI and reporting solution
• Based on our own data and analysis technologies running in Azure
• Focuses on dashboards, self-service reports, and Q&A
Get Data from on-premise, file, Azure, or SAAS stores and services
• Gateways to schedule refresh for on-premise data sources
• Azure stores and services directly connected. Use Stream Analytics for a “live” feed
A rich development and publishing system
• Power BI Desktop app allows you to develop offline and publish
• Content Packs make it easy to distribute, share, and consume
• Want more visuals. Check out the new community gallery.
Power BI Mobile
• IPhone and Android
• Windows Application
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Demo
Using PowerBI and SQL Server
Does this stuff work together?
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
References











Understanding Azure SQL Database and SQL Server in Azure VMs
Spotlight on SQL Database Active Geo-Replication
Azure SQL Database Benchmark Overview
Data Analytic Scenarios
Getting Started with Azure SQL Data Warehouse
Working with NoSQL Data in DocumentDB
Hadoop tutorial: Get started using Hadoop with Hive in HDInsight on Linux
Building Big Data Applications Using Azure HDInsight Service
Build your first pipeline using Azure Data Factory
Build your first Machine Learning Experiment
Azure Friday Videos
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Review








Azure has the security, privacy, scale, flexibility, and speed to meet the needs of
production applications
Azure SQL Database is a fully managed database service using the features of SQL
Server
Azure SQL Data Warehouse combines the power of APS and SQL Server
Azure HDInsight provides a semi-managed Hadoop suite of services
Azure Data Lake is a fully-managed store and analytics service for big data at scale
Azure Data Factory provides data pipeline and orchestration services
PowerBI brings a true self-service analytics and reporting solution
Azure Machine Learning is a fully managed predictive model cloud service
Sign-up free today and get $200 credit on Azure Services
MSDN subscribers get free Azure benefits
© SQLintersection. All rights reserved.
http://www.SQLintersection.com
Questions?
Don’t forget to complete an online evaluation on EventBoard!
Do you know Data in the Microsoft Cloud?
Your evaluation helps organizers build better conferences
and helps speakers improve their sessions.
Thank you!