Microsoft Fabric - data4success

For business intelligence, you collect all of a company's data in one place and use it for reporting. Fabric goes one step further: it makes this information available for artificial intelligence and distributes it further. Data is the fuel for AI, and Fabric, as the central data storage for Microsoft Copilot, is the place from which Microsoft AI draws its knowledge. This knowledge can then be used in Power BI, Windows, Microsoft 365, Dynamics 365 and other systems to make your work easier.

"Fabric is the biggest launch of a data product from Microsoft since the launch of SQL Server" (Satya Nadella, 2023)

We support you with the introduction of Microsoft Fabric, provide independent advice on the advantages and disadvantages and take care of the technical implementation in the long term. We manage your data in Fabric, structuring and linking it in such a way that we build a solid foundation for your business today. At the same time, we adapt Power BI to the new Fabric processes, which fundamentally reduce your loading times, data volumes and storage consumption.

In workshops, we evaluate the feasibility of a migration to Fabric for your company and work out the best individual migration path.

Power BI becomes part of Fabric

For a long time, we regarded Power BI as an independent application. However, it is increasingly becoming part of Fabric. In Power BI, I was able to access and transform data from various sources and get a data set as a result. In Fabric, I can continue to access and transform the data, but I can freely define my target, for example by describing a table again. So the former silo becomes a Strokewhich also makes the data available to others.

Fabric is the data layer for AI, and the basis of Copilot

This is about much more than just reports. It is about the data basis for the entire Microsoft AI in the company (Data is the fuel that powers AI). Fabric is used to manage what data is available in the company, how it is to be interpreted and how it can be evaluated. It thus becomes the fuel of Copilot. As a result, business intelligence experts automatically become the existential basis for future AI projects.

Using "notebooks", in which I can write SPARK SQL and Python, the data can be modified as required and also trained for AI and machine learning processes. I can also use "sempy" (also called "semantic link") in Python to make DAX queries directly on my semantic Power BI models and thus also use all measures. (Quote: The idea behind: Also look what the business created, and use this to train your data models.) I also get a SQL Endpointwhich I can access with Management Studio or my application. However, the permissions for the SQL endpoint depend on the object that I have created in Fabric:

WarehouseA warehouse is like a database with clustered columnstore indices. There I have structured tables and get all authorizations with the SQL endpoint (e.g. modify data, add data). Although there are no Identity Columns, Temp Tables and Primary-Key-Foreign-Key-Relationships, it is otherwise similar to SQL Server. Transactions are also supported here.
LakehouseA lakehouse is a hybrid of database and file storage, as we have known it from Polybase or Hadoop in the last decade. I can not only analyze relational tables there, but also Excel files, images, videos, audio files and other files. This is the ideal basis later on, to program your own GPTsand the AI the kickstart! In this hybrid constellation of OneDrive and tables, the SQL Endpoint has read permissions, but can neither add data nor transform objects. You call the SQL Endpoint "Warehouse on the lake". If you want to transform data in a lakehouse, you have to do this in a SPARK Notebook instead, where you can also define which file becomes which table in which way. Fortunately, in these SPARK notebooks you can then simply add a line with %%sql and then write normal SQL again underneath. The storage format in a Lakehouse is completely open source and is called "delta parquet files". This is compatible with Databricks, Snowflake, Amazon and the Google Cloud, among others. The order in which Fabric stores the data in the Delta Store corresponds to the Vertipaq engine from SQL Server Analysis Services (SSAS), and therefore also to that from the Power BI import mode. Transactions are not supported here. Everything running on SSD, with Terabyte caching. With the "OneLake File Explorer", I can view what is in OneLake directly in Windows Explorer and even copy and move the files there. I no longer need a "staging scheme" like in the warehouse in Lakehouse if the data is based on existing files.

One copy of the data (this part is revolutionary)

In many cases, the architecture of Fabric enables data to be no longer need to be copiedin order to be used. Instead of using load processes to load the data into different systems (for example, from the database into different Power BI semantic models), in future it will be sufficient if there is only one virtual reference to the data. This virtual reference can be used in any number of places. A distinction is made here between

ShortcutThis is actually just a reference to the original data memory. Queries are forwarded directly to the source at runtime. This works with fabric data as well as with Snowflake, Databricks, Amazon, Google Cloud and other data sources based on delta files (parquet). It also works with the Dataverse.
MirroringHere, a CDC-like stream is automatically set up from a database (e.g. SQL Server) to Fabric. CDC (change data capture) has existed in SQL Server for a long time. Every transaction (e.g. change to data or the structure) is then automatically reflected in Fabric with a slight delay.

In both cases, there is no need to program the usual loading processes. Nevertheless, all data can then be linked across sources in a query (e.g. join).

Every week we're deploying to production

The features that Fabric is to receive are always fixed for the next "semester", which begins on October 1 and April 1. Every week there are new releases that continue to improve Fabric. Improvement requests are prioritized according to the number of votes on ideas.fabric.microsoft.com. The Dark Mode for Power BI Desktop has finally been implemented with 10,000 votes.

It's all in one place - unified experience

In the last 20 years, you always needed a variety of tools for business intelligence projects. Import programs, ETL, Windows Services, Task Scheduler, SSAS, Visual Studio, Management Studio - it was normal to combine 3-6 different components to achieve the goal. Fabric has all these functionalities, so I only need one single program and no other. I control everything via Fabric, from start to finish, including AI. For me personally, that's a dream come true. Fabric is intended as "software as a service", where Microsoft takes care of all the settings itself, similar to Microsoft 365.

Excel is not a data source! Ok, it's a datasource, but not a database

If I have Excel files in the Lakehouse, then I should first program "convert to list" as an additional step in PowerQuery in order to convert the content of the Excel files into SQL. This SQL can then be linked to the rest of the queries so that the result of a complete PowerQuery workflow is executed as a single consolidated SQL statement on the source. This process is called "Query Folding", and you can recognize it in the web-based Power Query Editor by the green lightning bolt (instead of the red clock). I can even view execution plans there (right-click => View Execution Plan), where I should see "Value.NativeQuery" on the far left if all is well. However, folding is not yet available in Power BI Desktop. Quote: "I look for that day, because this will help everyone".

The heart of Power BI is the ability to create semantic models. This is SSAS Tabular.

Power BI uses exactly the same engine as SQL Server Analysis Services Tabular, which was first invented in SQL Server 2012. Back then, it replaced the multidimensional OLAP cubes, and today it is still the permanent heart of Power BI.

You have the fact tables in the middle, and the dimensions on the outside. That's what Power BI loves, and that's still true

I teach the topic of the star schema with the meaning of facts and dimensions as one of the first topics in every Power BI training course, even before we open Power BI Desktop. Not adhering to it is still one of the most common mistakes I see customers make. Adam Saxton has confirmed it: The star schema, as it was learned 20 years ago, is still the only true basis for Power BI today.

In the semantic model we should have clean names for the business

Having cryptic names in the database (e.g. abbreviations) is ok, but not in the semantic model in Power BI. The semantic model should show business users exactly what the column is about.

Never use "my workspace"

"My Workspace" in Power BI is really only intended for beginners who do not have a license. Especially not for sharing reports with colleagues. According to Adam, you should never use it. It is also important to know that Fabric admins can grant themselves access to other colleagues' "My Workspace" at any time without them realizing it. Everyone should be aware of this.

Common distribution method for larger audiences than 3 people: Create an app, and share the app with them.

I love Git integration with Power BI projects

Similar to Visual Studio projects, I can also create entire projects (.pbip files) in Power BI and place them under source code management via GitHub or Azure DevOps. Both GitHub and DevOps are part of 100% Microsoft and can be operated from Fabric. However, GitHub is a little more error-resistant. In larger projects, I should have a production branch and my own development branches, and can then resolve merge conflicts directly in Fabric and integrate the branches into each other. The data is not checked in, but is only stored in a local .abf file.

Pipeline is an orchestration piece. It manages the dataflows.

I can create a "pipeline" in Fabric, which is similar to an SSIS file (SQL Server Integration Services). I define a workflow there, which consists of actions that are in turn linked to conditions. For example, one action can be to load a table from a source to Fabric. The next action can then be to update a semantic model based on this data. I can then use the "monitor" to see across workspaces which loading processes ran in which pipelines and when, how long they took and which errors they encountered. I have completely integrated an entire ETL tool into Fabric, which was previously only possible with Azure Data Factory (ADF) in the cloud. The individual actions can now finally be implemented here with PowerQuery, just like in Power BI. This will make the development process much easier.

Evaluate your Fabric usage with the capacity metrics app

Fabric can only be licensed via capacities and not according to the number of users like Power BI. The hardware behind all Fabric capacities is always the same. If jobs need more juice than the paid capacity, they are simply terminated with an error message. The capacities range from approx. 300 EUR / month to enterprise variants with 6-digit monthly amounts. However, the smallest variant was the jail and proposed as a temporary penalty for users who generate too high a workload. In order to be able to estimate the required Fabric capacity, a realistic workload should be generated both on the load side (= "background") and on the usage side of the Power BI reports. In Fabric, there is a ready-made Power BI report that makes these metrics evaluable. At the top right of the report, you can see how much % of the available capacity was used at what time (reports in red, loading processes in blue). The loading processes are displayed as a 10-minute average. If I right-click to select the drill-through in the graphic, this can be broken down to 30-second sections. In addition, load causes are displayed in a burndown table.

I'm a manager, I can drag files into a folder, I'm a data scientist.

It's that easy to work with Fabric. Pull in files and I'm already calling myself a data scientist. In theory, at least.

Whenever you have a small amount of data and a good DBA, use DirectQuery.

It works if you have a little bit of data, or even mid-sized data and a very good DBA.

The first time you run a Direct Lake Power BI model, you'll say: Import mode is faster.

From the last 8 years of Power BI we know the query mode "Import" and "DirectQuery". Import mode loads all data into the Power BI file at night, whereas DirectQuery generates SQL for every user interaction, which is sent directly to the database.

A third mode, "Direct Lake", is now being introduced with the Lakehouse. Here, the data must necessarily be stored in Fabric, but in return the Power BI DataSets no longer have to load the data themselves, but only exist virtually. Technically speaking, the DataSets or semantic models are then an in-memory engine that only retrieves its data from the source at runtime and then stores it temporarily until it is removed again using the caching algorithm. This eliminates the loading process for the DataSets! Here is an illustration of mine that shows the savings effect:

In this example, I have 9 tables that become 3 different DataSets. In the old Power BI world (import mode), I would have to load the corresponding data once for each DataSet, resulting in a total of 17 loading processes. In the new world (Direct Lake), it is sufficient to load each table only once. This saves me 8 loading processes. At the same time, however, this means that some of the data is "cold" and needs to be warmed up first. That's why Patrick recommends adding DAX statements via sempy to a notebook as the last step of a pipeline that simulates realistic queries on Monday mornings to warm up the cache.

In general, data preparation should be performed as far upstream as possible

Thanks, Adam, for the support. Data enrichment and transformation should be done as far back in the backend as possible. Fortunately, no one in the room responded to the question "Who uses calculated columns", because Power BI reports are not there to clean data or enrich it with categories. Whenever possible, this should be done in SQL or in the loading process upstream of Power BI. Then it is also reusable. Enrichments have no place in the report, even if many beginners do this.

Adam, put us asleep with security....

In general, it can be said that Fabric is ISO 27001-certified and therefore meets high data protection requirements. The Microsoft cloud is a high-security wing that medium-sized companies cannot compete with. Entra ID (= Azure Active Directory) is used in Fabric in the same way as in Power BI. TLS encryption 1.2 is activated in all connections.

As we have many different objects in Fabric, there are also different security levels, some of which are still under development (preview):

Workspace SecurityI can assign a specific role to a user for a workspace, e.g. Viewer, Contributor or Admin. Viewers can actually only see Power BI reports and no other fabric elements. Contributors and Admins, on the other hand, can also see all Fabric elements.
Row-level security for semantic modelsAs in Power BI, I can introduce roles for each semantic model, the visibility of which I restrict to elements of one dimension (e.g. customer group).
Row-level security for tablesSince I can also provide SQL endpoints to users in Fabric, I can pre-filter these tables so that the user can only query the data relevant to them in SQL. However, this only works if the filter criterion actually exists in this one table and not in another table. I remember a project where we had already used this in SQL Server 2012 - same syntax, same effect. However, if I have fact tables typical for DWH, then my filter criterion is usually no longer in the table itself. Another disadvantage is that the Direct Lake mode cannot be combined with these filter criteria, so the slower DirectQuery is always used as a fallback.

ADD FILTER PREDICATE Security.tvf_securitypredicate(SalesRep)

Object-level security for columns in SQLFor an SQL endpoint that I provide to a user, I can specify which columns they are allowed to see and which they are not. It's quite simple:

GRANT SELECT ON InternetSales(Column1, Column2 ....)
DENY SELECT ON InternetSales(Column3)

Object-level security for tablesAs in SQL Server, I can define roles that have rights at different levels (Read / Write / Alter / Execute etc.) either to a schema or to selected database objects such as tables / views / stored procedures. I can then assign users to these roles. It should be noted, however, that this part is still in the preview phase. The authorization for other files (e.g. images, audio files) also does not seem to be possible at such a granular level yet.