Technologies for BI in transition
Microsoft technologies are constantly evolving. In the following, I would like to present various compilations of technical components that can be used to implement professional business intelligence projects today and in the future.
Current technology stack for SQL Server On Premise
On the left side all possible source data is displayed. This can be databases, Excel files, APIs, REST interfaces, CSV files, Sharepoint lists, PDF files and other formats. These can be located in Sharepoint, OneDrive, on premise or in the cloud.
The yellow part usually takes place on Windows servers. These can either be located on premise in the company's own data center or in a dedicated VNET in the Azure Cloud.
PowerAutomate is used via a gateway to move, for example, files located in Sharepoint (M365) to the file system of the servers. Using the C# programming language and the .NET framework, these files and the other data sources are copied to SQL Server. Within SQL Server, the data is transformed, structured and connected. From there, they are pre-aggregated via SQL Server Analysis Services, and enriched with additional analytical functionality. Excel can consume these pre-aggregated data models directly. Power BI can also consume them via another gateway, eliminating the need to build custom Power BI data models. This allows a single model to be reused for hundreds of Power BI reports.
Current technology stack with Azure SQL database in the cloud
Here, the local SQL server has now been replaced by an Azure SQL database in the cloud.
Power BI needs a two-step model in this variant to ensure reusability. Power BI data models pull data from the Azure SQL database, and Power BI reports, on the other hand, only use the data model created in the first step.
Technologies with Fabric (in the future)
In the future, the servers will no longer be needed at all. According to the data lake approach, Fabric can both store the original files (thus eliminating the file system) and orchestrate the load processes via so-called "notebooks", which are necessary for loading the raw data (thus eliminating C# on the server).
Within Fabric, the programming languages of choice (e.g. C#, Python, SQL, Spark) can be flexibly combined for all processing steps, and external libraries can also be integrated for further analytical purposes. Fabric is expected sometime in 2024.
In addition to normal data sources, we have also shown here how to handle sensor data. Sensor data can be stored in real time in the Azure Event Hub, and then flexibly read by Fabric using the "Kusto Query Language" (KQL). With KQL, billions of data sets can be easily queried for relevant content.
In Fabric, there will be a special "OneLake mode" for Power BI that no longer imports the data into the dataset, but simply accesses the original source natively, but thus has the same functionalities and performance of native datasets. The dataset is thus virtualized, so to speak. This means that Power BI no longer needs its own data updates, and all the loading processes are no longer necessary.
How new projects can be implemented today to be prepared for Fabric
Since Fabric is not yet properly released, for new projects we should use those technologies that are firstly most compatible, and secondly migrated into Fabric by Microsoft.
Azure Synapse Analytics was called "Azure Data Warehouse" until a few years ago. The "Data Engineering" part of Fabric has already been implemented there. It includes both the "Notebooks", with which the ETL processes can be flexibly programmed, and the "Data Lake", which dynamically links files from a file system with structured data in database tables.
To create Power BI datasets, this approach imports the data into Power BI (as before), and reuses it from there.