Python with Power BI
Python with Power BI

Python With Power BI

Actually, when you need to create a bilateral data analysis you can use Power BI by Microsoft. It is not only interactive but also can visualize your information for your business intelligence. So that is why it is called BI or in other words Business Intelligence. It is useful because you can use Python with it.

Using Python you can enhance Power BI programing language capabilities. Python dashboard for data analysis, data acceptance, data conversion, data addition, and data visualization, also can use complicated functions such as machine learning libraries and more. This all can be done with numerous data dashboards with Python thankfully to these two mechanics Power BI and Python. Or in other words, thanks to Python BI Power.

Either experts or newbies can use this blog to learn Python BI Power. We will also use pandas and Matplotlib libraries.

Actually, pandas is an open-source library that can be used for operating with relative and termed data simply. It provides an interactive dashboard Python made from data and some functions for working with numbers and times. The core of pandas is the NumPy library and it is efficient and it has high performance.

And matplotlib is one of the excellent libraries that can help you visualize Power BI drag and drop dashboard for arrays. It is useful when you need to create a visualisation of big numbers. And also it has a lot of settings for your plots such as lines, bars, and even histograms.

So the answer to the question: When can you use BI Power? is “You can use them with a sales data dashboard using Python.”.

The Start in Using Power BI

As it is logical, Power BI code language is created by Microsoft so the only operating system where it runs on is Windows. But do not worry, you can use it with macOS or Linux distribution or any other operating system if it supports virtual machines. The version of Windows that supports Microsoft BI Power is Windows 8.1 and newer. Actually, if you are not using Windows as your main operating system, you need to have 30 gigabytes for a virtual operating system.

Installation of Power BI Desktop

Here we will try to set all tools and after that, we can make codes using Python. Microsoft Power BI Desktop is a powerful collection of tools and services that can be obtained, free of charge, without a Microsoft account, and with no need for an Internet connection. You can easily install it on your computer by accessing the Microsoft Store from the Start menu or its web-based storefront. With this amazing suite, you can work offline like a traditional office suite, giving you the convenience of having all the tools and services you need in one place. By installing Power BI Desktop from the Microsoft Store, you can ensure automatic and quick updates to the most recent versions of the tool without having to be logged in as the system’s administrator.

 

If the usual method of installing Power BI Desktop doesn’t work for you, you can always try downloading the installer from the Microsoft Download Center and running it manually. This executable file is about 400MB in size. Once you have the application installed, launch it and you’ll be met with a welcome screen. At first, the Power BI Desktop user interface may seem intimidating, but don’t worry. You’ll become accustomed to the basics as you progress through the tutorial.

What Python Code Editor to Choose?

To maximize your experience, why not install Microsoft Visual Studio Code? This free and modern code editor is immensely popular and can be easily found in the Microsoft Store. If you already use an IDE such as PyCharm or don’t require any of the advanced editing capabilities, feel free to skip this step. Otherwise, Visual Studio Code is a great choice for any coding enthusiast.

Microsoft Power BI Desktop offers only basic code editing capabilities, which is understandable given its primary purpose as a data analysis tool. Unfortunately, it lacks advanced features such as intelligent contextual suggestions, auto-completion, or syntax highlighting for Python, which are essential for writing anything but the most easy Python scripts in Power BI coding language. Consequently, it is highly recommended that you use an external code editor for writing more complex Python scripts Power BI.

You can download Visual Studio Code on any operating system without virtual machines and you can find it on the Microsoft website. The installation is simple as Visual Studio helps you when you download the installer.

VS Code is a cutting-edge code editor that brings seamless support for a wide array of programming languages via its extensions. Actually, it does not Power BI Python support from the box as PyCharm, but it will offer to install extensions to start working. Say goodbye to limitations, as VS Code doesn’t just stop at Python for Power BI. With its intuitive interface, it becomes a Python powerhouse once you open an existing Python file or create a new one, automatically recognizing the language and prompting you to install the best set of recommended extensions designed specifically for Python programming.

But all this works only if you have already installed raw Python on your computer. You can google it and find the answer to how to do this.

Does BI Desktop Need Some Libraries?

The answer is “Yes, it does”. Unleash the full potential of Power BI Desktop by ensuring that your Python setup is equipped with pandas and Matplotlib. These libraries are not part of the standard installation, but can easily be obtained if you’ve utilized Anaconda. It’s worth noting that incorporating third-party packages into the global Python interpreter is discouraged, as it can pose potential risks. Moreover, attempting to run the system interpreter from Power BI Python on a Windows machine is not possible due to permission restrictions. The solution? A Python virtual environment is a secure and efficient way to manage your Python packages and dependencies.

What is a Virtual Environment?

An isolated folder, a virtual environment, comprises a directory that comprises a replication of the main Python interpreter, allowing you to experiment with your heart’s content. You can install any additional libraries within this space without any concern of disrupting other programs that are reliant on Python. And, whenever you wish, you can effortlessly eliminate the folder holding your virtual environment without any adverse effect on the existence of Python on your machine.

To create a new virtual space you need to use a Windows terminal and write the command:

python -m venv python-virtual

Here we named our folder “python-virtual” but you can use any you want.

In just a matter of moments, a fresh folder containing a duplicate of the Python interpreter will materialize on your desktop. With this, you can activate the virtual environment by executing its activation script, and subsequently install the two libraries that are required by Power BI. To achieve this, type in the following commands while your desktop remains in your present working directory:

.\python-virtual\Scripts\activate
python -m pip install pandas matplotlib

Upon activation, you should be able to identify your virtual environment with the name “python-virtual” in the command prompt. Failure to do so would result in the installation of additional third-party packages into the primary Python interpreter, which is precisely what we aimed to avoid. Congratulations, you’re almost there! You can repeat the activation and pip installation steps if you wish to incorporate additional libraries into your virtual environment. Finally, the next step is to inform Power BI of the location of Python in your virtual environment.

Let’s Run Python on It

Firstly, we need to set special options in Power BI Desktop. Once you access the configuration options, you’ll find various settings organized by categories. Locate the category named “Python scripting” in the left-hand column, and proceed to set the Python home directory by selecting the “Browse” button.

To ensure proper functionality, it’s important to specify the path to the “Scripts” subfolder in your virtual environment, which contains the “python.exe” executable. If your virtual environment is located in your Desktop folder, your path should resemble the following format:

\Desktop\python-virtual\Scripts

Before “\Desktop” must be the name of your user. If the designated path is incorrect and does not contain a virtual environment, you will receive an appropriate error message.

Great job! You have now successfully configured Python to Power BI. One key setting to verify is the path to your Power BI to Python virtual environment. It should include both the pandas and Matplotlib libraries. With this setup complete, you’re ready to start exploring the capabilities of Python with Power BI in the next section.

So next let’s talk about running the code and how it works.

How Can You Operate With It?

There are several methods available to execute Python and Power BI, each of which seamlessly integrates with a data analyst’s regular workflow. One such method involves using Python as a data science using Power BI source to import or create datasets within your report. Another method involves utilizing Python to perform data cleaning and other transformations on any dataset directly in Power BI with Python. Additionally, Python’s advanced plotting libraries can be used to create engaging and informative data visualizations. This article will explore all three of these applications in detail.

Using pandas.DataFrame

If you need to ingest data into Power BI from a proprietary or legacy system, you can use a Python  BI script tool to connect to the system and load the data into a pandas DataFrame. This is a useful approach when the data is stored in an obsolete or less commonly used file format that Power BI does not support natively.

To get started, you can write a Python script that connects to the legacy system and loads the data into a pandas DataFrame. Once the data is in a DataFrame, you can manipulate it using the pandas library to clean and transform the data as needed.

Power BI can then access the DataFrame by connecting to the Python script and retrieving the DataFrame. This allows you to leverage the power of both tools – the data manipulation capabilities of Python and the visualization and reporting capabilities of Power BI.

In this tutorial, we’ll use Python to load fake sales data from SQLite, which is a popular file-based database engine. While it is technically possible to load SQLite data directly into Power BI Desktop using an appropriate driver and connector, using Python can be more convenient since it supports SQLite out of the box.

Before jumping into the code, it would help to explore your dataset to get a feel for what you’ll be dealing with. It’s going to be a single table consisting of used car dealership data stored in the “sales.db” file.

Let’s imagine this file has a thousand records and a lot of columns of data in the table, which represent sold goods, their buyers, and the date of sold items. May you remember what we mentioned about Anaconda? Yes, Anaconda has a Jupyter Notebook that can be called a code editor. You can quickly visualize this sample database by loading it into a pandas DataFrame and sampling a few records in a Jupyter Notebook using the following Power BI Python tutorial:

import sqlite3
import pandas as pand

with sqlite3.connect(r"C:\Users\User\Desktop\sales.db") as connection:
  df = pand.read_sql_query("SELECT * FROM sales", connection)

df.sample(15)

Note that the path to the sales.db file may be different on your computer. If you can’t use Jupyter Notebook, then try installing a tool like SQLite Browser and loading the file into it.

At a glance, you can tell that the table needs some cleaning because of several problems with the underlying data. However, you’ll deal with most of them later, in the Power Query editor, during the data transformation phase. Right now, focus on loading the data into Power BI.

As long as you haven’t dismissed the welcome screen in Power BI yet, then you’ll be able to click the link labeled Get data with a cylinder icon on the left. Alternatively, you can click Get data from another source on the main view of your report, as none of the few shortcut icons include Python. Finally, if that doesn’t help, then use the menu at the top by selecting Home › Get data › More… as depicted below.

Doing so will reveal a pop-up window with a selection of Power BI connectors for several data sources, including a Python script, which you can find by typing Python into the search box.

Select it and click the Connect button at the bottom to confirm. Afterward, you’ll see a blank editor window for your Python script, where you can type a brief code snippet to load records into a pandas DataFrame.

You can notice the lack of syntax highlighting or intelligent code suggestions in the editor built into Power BI. As you learned earlier, it’s much better to use an external code editor, such as VS Code, to test that everything works as expected and only then paste your Python code to Power BI.

Before moving forward, you can double-check if Power BI uses the right virtual environment, with pandas and Matplotlib installed, by reading the text just below the editor.

While there’s only one table in the attached SQLite database, it’s currently kept in a denormalized form, making the associated data redundant and susceptible to all kinds of anomalies. Extracting separate entities, such as cars, sales, and customers, into individual DataFrames would be a good first step in the right direction to rectify the situation.

Fortunately, your Python script may produce as many DataFrames as you like, and Power BI will let you choose which ones to include in the final report. Here in Power BI programming language code, you can extract those three entities with pandas using column subsetting in the following way:

import sqlite3
import pandas as pand

with sqlite3.connect(r"C:\Users\User\Desktop\sales.db") as connection:
  df = pand.read_sql_query("SELECT * FROM sales", connection)

goods = df[
[
"color",
"purchase_date",
"purchase_price",
"investment",
]
]
sales = df[["sale_price", "sale_date"]]

First, you connect to the SQLite database by specifying a suitable path for the car_sales.db file, which may look different on your computer. Next, you run a SQL query that selects all the rows in the sales table and puts them into a new pandas DataFrame called df. Finally, you create three additional DataFrames by cherry-picking specific columns. It’s customary to abbreviate pandas as pd in Power BI coding. Often, you’ll also see the variable name df used for general, short-lived DataFrames. However, as a general rule, please choose meaningful and descriptive names for your variables to make the code more readable.

When you click OK and wait for a few seconds, Power BI will present you with a visual representation of the four DataFrames produced by your Python script.

The resulting table names correspond to your Python variables. When you click on one, you’ll see a quick preview of the contained data. The screenshot above shows the customers table, which comprises only two columns.

Select cars, customers, and sales in the hierarchical tree on the left while leaving off df, as you won’t need that one. You could finish the data import now by loading the selected DataFrames into your report. However, you’ll want to click a button labeled Transform Data to perform data cleaning using pandas in Power BI.

In the next section, you’ll learn how to use Python to clean, transform, and augment the data that you’ve been working within Power BI.

Using Python Query Editor

If you have followed the instructions in this guide, you should now be in the Power Query Editor, which displays the three DataFrames you selected earlier. These DataFrames are referred to as queries in this particular view. However, if you have already imported data into your Power BI report without applying any transformations, there’s no need to worry! You can access the same editor at any time.

To do so, navigate to the Data perspective by clicking on the table icon located in the center of the ribbon on the left-hand side, and then select Transform data from the Home menu. Alternatively, you can right-click on one of the fields in the Data view on the far right of the window and choose the Edit query for the same result. Once you have accessed the Power Query Editor window again, you will be able to see your DataFrames or Queries on the left-hand side, while the Applied Steps for the currently selected DataFrame will be displayed on the right-hand side, with rows and columns in the center.

Each step in the Applied Steps represents a sequence of data transformations that are applied in a pipeline-like fashion against a query, from top to bottom. Each step is expressed as a Power Query M formula. The first step, named Source, involves invoking your Python script, which generates four DataFrames based on the SQLite database. The other two steps extract the relevant DataFrame and transform the column types.

By clicking the gear icon next to the Source step, you’ll reveal your data ingestion script’s original Python source code. This feature can access and edit Python code baked into a Power BI report even after saving it as a .pbix file.

You can insert custom steps into the pipeline for more granular control over data transformations. Power BI Desktop offers plenty of built-in transformations that you’ll find in the top menu of Power Query Editor. But in this tutorial, you’ll explore the Run Python script transformation, which is the second mode of running Python code in Power BI:

Conceptually, it works almost identically to data ingestion, but there are a few differences. First of all, you may use this transformation with any data source that Power BI supports natively, so it could be the only use of Python in your report. Secondly, you get an implicit global variable called dataset in your script, which holds the current state of the data in the pipeline, represented as a pandas DataFrame.

Note: As before, your script can produce multiple DataFrames, but you’ll only be able to select one for further processing in the transformation pipeline. You can also decide to modify your dataset in place without creating any new DataFrames.

Pandas lets you extract values from an existing column into new columns using regular expressions. For example, some customers in your table have an email address enclosed in angle brackets (<>) next to their name, which should belong to a separate column.

Select the customer’s query, then select the last Changed Type step, and add a Run Python script transformation to the applied steps. When the pop-up window appears, type the following Python script code example:

dataset = dataset.assign(
full_name=dataset["customer"].str.extract(r"([^<]+)"),
email=dataset["customer"].str.extract(r"<([^>]+)>")
).drop(columns=["customer"])

When working with Power BI, you can utilize the implicit dataset variable in your script to reference the customer’s DataFrame, giving you access to its methods and allowing you to override it with your transformed data. Alternatively, you have the option to define a new variable for the resulting DataFrame. During the transformation process, you can add two new columns, full_name and email, and then remove the original customer column containing both information pieces.

Once you’ve finished your transformation, clicking OK and waiting a few seconds will display a table showing the DataFrames your script produced. In this case, there is only one DataFrame named dataset, as you reused the implicit global variable provided by Power BI for your new DataFrame. To choose your desired DataFrame, simply click the yellow Table link in the Value column.

Your customers` table now has two new columns, allowing you to quickly identify customers who have not provided their email addresses. If you desire further transformations, you can add additional steps. For example, you could split the full_name column into separate columns for first_name and last_name, assuming that there are no instances of customers with more than two names.

Be sure to select the final transformation step and insert another Run Python script. The corresponding Python code for this step should appear as follows:

dataset[
["first_name", "last_name"]
] = dataset["full_name"].str.split(n=1, expand=True)
dataset.drop(columns=["full_name"], inplace=True)

Unlike in the previous step, the dataset variable refers to a DataFrame with three columns, full_name, and email, because you’re further down the pipeline. Also, notice the inplace=True parameter, which drops the full_name column from the existing DataFrame rather than returning a new object.

You’ll notice that Power BI gives generic names to the applied steps and appends consecutive numbers to them in case of many instances of the same step. Fortunately, you can give the steps more descriptive names by right-clicking on a step and choosing Rename from the context menu:

By editing Properties…, you may also describe in a few sentences what the given step is trying to accomplish.

When you’re finished transforming your datasets, you can close the Power Query Editor by choosing Close & Apply from the Home ribbon or its alias in the File menu:

This will apply all transformation steps across your datasets and return to the main window of Power BI Desktop.

Next up, you’ll learn how to use Python to produce custom data visualizations.

Power BI Python Data Transformation

So far, we’ve covered importing and transforming data using Python in Power BI Desktop. Python’s third and final application is creating visual representations of your data. When it comes to visualizations, you have the flexibility to use any of the supported Python libraries, provided you’ve installed them in the virtual environment that Power BI utilizes. However, Matplotlib serves as the foundation for plotting, which other libraries delegate to in any case.

If Power BI hasn’t already directed you to the Report perspective following your data transformations, you can now navigate by clicking on the chart icon on the left ribbon. This will bring up a blank report canvas where you can add your graphs and other interactive components, collectively referred to as visuals.

Over on the right in the Visualizations palette, you’ll see several icons corresponding to the available visuals. Find the icon of the Python visual and click it to add the visual to the report canvas. The first time you add a Python or R visual to a Power BI report, it’ll ask you to enable script visuals.

In fact, it’ll keep asking you the same question in each Power BI session because there’s no global setting for this. When you open a file with your saved report that uses script visuals, you’ll have the option to review the embedded Python code before enabling it. Why? The short answer is that Power BI cares for your privacy, as any script could leak or damage your data if it’s from an untrusted source.

However, if you’ve configured Power BI to use an external code editor, then clicking on the little skewed arrow icon (↗) will launch it and open the entire scaffolding of the script. You can ignore its content for the moment, as you’ll explore it in an upcoming section. Unfortunately, you have to manually copy and paste the script’s part between the auto-generated # Prolog and # Epilog comments back to Power BI when you’re done editing.

Note: Don’t ignore the yellow warning bar in the Python script editor, which reminds you that rows with duplicate values will be removed. If you only dragged the color column, then you’d end up with just a handful of records corresponding to the few unique colors. However, adding the vin column prevents this by letting colors repeat throughout the table, which can be useful when performing aggregations.

To demonstrate an elementary use of a Python visual in Power BI, you can plot a bar chart showing the number of goods painted in a given color. Here is an example of import visualize Python:

import matplotlib.pyplot as mat

mat.style.use("seaborn")

series = dataset[dataset["color"] != ""]["color"].value_counts()
series.plot(kind="bar", color=series.index, edgecolor="black")

mat.show()

To get started with creating visualizations, you can begin by enabling Matplotlib’s theme that mimics the seaborn library. This will provide a more visually appealing look and feel compared to the default theme.

Next, you can remove any records with missing color data, and count the number of remaining records in each unique color group. This will result in pandas.Series object that can be plotted and color-coded using its index, which consists of the color names. Finally, you can render the plot by calling plt.show().

With these steps, you can easily create a basic visualization of your data using Python in Power BI. Of course, the possibilities for visualizing your data are endless, and you can explore and experiment with other Python libraries and techniques to create even more engaging and informative visualizations.

Additional Settings For Power BI Desktop

With the power of pandas and Python, there are countless possibilities for transforming your datasets in Power BI. Some examples include:

  • Anonymizing sensitive personal information, such as credit card numbers
  • Identifying and extracting new entities from your data
  • Rejecting sales with missing transaction details
  • Removing duplicate sales records
  • Unifying inconsistent purchase and sale date formats

These are just a few ideas to get you started, but the possibilities are endless. While we can’t cover everything in this article, don’t hesitate to experiment on your own. Keep in mind that your success in using Python to transform data in Power BI will depend on your understanding of pandas, which is the library that Power BI uses under the hood. The more you learn about pandas and their capabilities, the more you can achieve with your data in Power BI.

    Special Code Editor

    Within the Python scripting options in Power BI, a useful setting allows you to specify the default Python integrated development environment (IDE) or code editor you prefer to use when working on a code snippet. You can stick with the operating system’s default program associated with the .py file extension, or you can select a specific Python IDE of your choice to launch within Power BI. This flexibility can make it easier and more efficient for you to write and debug Python code directly in Power BI.

    To indicate your preferred Python Integrated Development Environment (IDE), opt for “Other” from the initial dropdown menu, and navigate to the executable file of your preferred code editor. For instance, you may browse this one:

    \Desktop\Programs\Microsoft VS Code\Code.exe
    

      As before, the path to your app can be different and contain different folders.

        What are the Cons of Using Python Power BI?

        Python integration in Power BI Desktop has some limitations you should be aware of.

          Timeouts

          The most notable limitations are related to timeouts, data size, and non-interactive visuals. Your data ingestion and transformation scripts defined in Power Query Editor can’t run longer than thirty minutes. Python scripts in Power BI visuals are limited to only five minutes of execution, and there are additional data size limitations, such as only being able to plot the top 150,000 rows or fewer in a dataset and the input dataset can’t be larger than 250 megabytes.

            Marshaling

            In Power BI Desktop, the communication between Power BI and Python is done by exchanging CSV files. Therefore, when using Python to manipulate data, the script must load the dataset from a text file created by Power BI for each run, and then save the results to another text or image file for Power BI to read. This redundant data marshaling can result in a significant performance bottleneck when working with larger datasets. It is the biggest drawback of Python integration in Power BI Desktop.

            If you encounter poor performance, you may want to consider using Power BI’s built-in transformations or the Data Analysis script Expression (DAX) formula language instead of Python. Another approach to improve performance is to reduce the number of data serializations by collapsing multiple steps into a single Python script that does the heavy lifting in bulk. For example, instead of making multiple steps in the Power Query Editor for a very large dataset, you can combine them into the first loading script.

              Python Visualization

              Data visualizations created using Python code are static images, which means you can’t interact with them to filter your dataset. However, Power BI will update the Python visuals in response to interacting with other visuals. It’s worth noting that Python visuals take slightly longer to display due to the data marshaling overhead and the need to run Python code to render them.

                And Others

                Using Python in Power BI has some other minor limitations. For instance, it can be difficult to share Power BI reports that rely on Python code with others, since the recipients would need to install and configure Python. Additionally, all datasets in a report must be set to a public privacy level for Python scripts to work properly in the Power BI service. Furthermore, a finite number of supported Python libraries in Power BI exist. There may be additional minor limitations, which can be found in Microsoft’s documentation that outlines how to prepare a Python script and known limitations of Python visuals in Power BI.