A Point-to-Site (P2S) connection allows you to securely connect individual clients to an Azure private network.
It is useful for remote workers but can replace a Site-to-Site (S2S) connection for professionals or small organizations where only a few clients need to connect to the cloud environment.
In this scenario, we configured a hub-spoke network topology for a cloud migration project to support business intelligence reporting.
In this post we will present some details about the procedures carried out.
The goal is to offer an overall architectural vision that can be an inspiration for those facing a similar scenario.
Scenario
A team of 3 developers works temporarily remotely and needs to start a new business intelligence project.
The project involves the migration to the cloud of a SQL Server database hosted on-premises and the creation of some PowerBI reports.
The data currently resides in the on-premises SQL Server database that is used as a mirroring of an OLTP system that cannot be modified.
Requirements
Infrastructure:
Connecting to Azure must be secure.
No Azure machine must be exposed to the internet via public IP, access must take place with RDP protocol to private endpoint.
Developers own business laptops on which it is not allowed to install tools.
SQL Server database migration must be offline with zero downtime.
The cloud storage solution must not involve the management of any hardware and offer a full version of SQL Server.
Two separate servers must be set up, one for testing and one for production, with different inbound/outbound rules.
Access to cloud SQL Servers should not be directly from remote clients but only from development machines used as jumpboxes.
The disks of the machines hosting the two instances of SQL server must be encrypted (OS/data).
Development machines have the same configuration and inbound/outbound network security rules.
Costs must be reduced where possible
Data:
An integration system will need to populate the cloud database with data from the on-premises OLTP system.
The data transfer will take place overnight to make the processed data available in the morning.
The volume of data to be imported daily is variable.
Solution
For network infrastructure and access from remote clients, a hub-spoke topology with 3 virtual networks (one hub network, and two spoke networks) is implemented.
SQL Server on Azure Virtual Machine is chosen as the storage solution because it is the Azure SQL offering that provides, at the time of this implementation, a full version of SQL Server in the cloud.
The network scheme provides:
a hub network "datasoluzioni-hub" containing a route-based Virtual Network Gateway on which a P2S connection with IKEv2 protocol has been configured and connected to the three devices of the developers. Authentication is done by certificate.
a "dev-vnet" network to host development machines with a single Network Security Group (Nsg) applied at the subnet level. The development machines are set with automatic shutdown so as not to incur excessive costs during the hours of non-use.
a "backend-spoke" network containing three subnets:
"TestSubnet" will host the SQL Virtual Machine for development/testing;
"ProdSubnet" will host the SQL Virtual Machine for production;
"DataGateway" will host a WindowsServer2019 on which the PowerBI data gateway and the Self-Hosted Integration Runtime for the Azure Data Factory integration task will be installed.
To connect between PowerBI Service and other cloud services, such as Azure SQL Database, you do not need to configure the data gateway. SQL Server on Azure VM requires the gateway.
Microsoft recommends that you do not install it on the same machine that hosts your data source in order to avoid resource contention.
It is also recommended to use an SSD disk for this machine. Refresh queries that return a large volume of data are temporarily stored on the gateway machine until data receipt from the source is complete, and then sent back to the cloud service.
This process is called spooling.
Two Nsg are applied directly to the network interfaces of Azure SQL Sever machines.
A single Nsg is applied to the "DataGateway" subnet.
To allow access to the development machines, a virtual network peering is configured between the hub network and the "dev-vnet" network, allowing transit from a remote gateway.
A second virtual network peering is configured between the "dev-vnet" network and the "backend-spoke" network, allowing developers to connect to SQL Servers from the working machines.
The dev/test Azure SQL Server is configured on a machine with reduced capacity.
A memory-optimized E-series machine is configured for the production Azure SQL Server.
SQL VMs disks are encrypted with Azure Disk Encryption (ADE) technology with a key contained in Azure Key Vault.
Note that managed disks are already encrypted by default on the storage side with Azure Storage Service Encryption (SSE) technology and a key managed directly from Azure (PMK). You can check this from the Azure portal in the disks panel of the VM.
However, if you run the following PowerShell command you will notice that the state of the disks is NotEncrypted.
This is because, in order to encrypt OS disks and data, you need a security key (in this scenario managed in Azure KeyVault). You can use Powershell or run directly through the Azure portal. The result will be visible both from the Get-AzVmDiskEncryptionStatus command:
or from the disks panel of the machine:
Data migration is made offline with the .bak file from the on-premises SQL Server database, which is initially restored to the SQLVM-test server.
The new data is fed into SQLVM-test via Azure Data Factory pipelines. The source is the OLTP system connected via Self-Hosted Integration Runtime located on-premises, and the destination is the new SQL Server on Azure VM.
Once development and testing are completed, the test database is migrated to the production SQL Server and hooked to the integration pipelines.
The test machine is deallocated, but not removed.
Given the variability of the data volumes to be processed daily, the refresh of the PowerBI model is placed directly at the end of the ETL pipeline, through a specific Web Activity pipeline that processes the model in PBI Service through REST API.
For cost optimization, the machine hosting the PowerBI Data Gateway/Self-Hosted IR is scheduled to turn on before data loading and it turns off once the model refresh in PowerBI is completed.
You insert two webhooks that call Powershell automation scripts hosted in an Automation Account.
In this case, you should pay attention to the time it takes for Data Factory to receive runtime availability status.
If the execute-ETL pipeline finds the offline integration runtime it will fail.
In our case 5 minutes of waiting are sufficient and considered tolerable.
This design is one of the possible that can be implemented and does not want to have the presumption of being the best or the only possible.
Every situation deserves in-depth considerations that are difficult to generalize.
Comentarios