Graph API and analytical queries

Introduction

Sometimes you need to run analytical queries that are not possible to achieve with the Microsoft Graph API alone. For example, you want group all users by country, state, city and department. You want to check which applications do not have service principals, etc..

You don't want install a SQL Server or PostgreSQL to run these queries. You don't want to import data to a database. So, how to deal with this situation?

One solution is to use DuckDB, a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations and complex types.

DuckDB can import data from csv, json, or parquet files.

Prerequisites

Let's focus on a PowerShell. If you are familiar with the Graph API, you know that you can use the Microsoft Graph PowerShell SDK to interact with the Graph API.

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

Install the Microsoft.Graph module and PSDuckDB module.

Install-Module Microsoft.Graph
Install-Module PSDuckDB

Examples

Example 1

Let's start with a simple example. I want to get a number of member users grouped by country, state and city.

First, I will retrieve all member users from my tenant by executing Get-MgUser cmdlet and exclude guest users.

Connect-MgGraph -Scopes "User.Read.All"

$users = Get-MgUser -All -Property "id,city,country,state" -Filter "userType eq 'Member'"
$users.ToJsonString() | Out-File "$PSScriptRoot\users.json" -Encoding utf8

It will generate a JSON file with all member users in the tenant. The ToJsonString method creates a JSON string from the Graph object.

Now, prepare SQL query to run against this data. The source is a JSON file, the ouput from the previous step.

$db = New-DuckDBConnection

$sql = @"
 SELECT COUNT(*) as count,country,state,city
 FROM '$PSScriptRoot\users.json'
 GROUP BY country,state,city
 ORDER BY count desc;
"@

$db.sql($sql) | Format-Table *
$db.closedb()

The result for my test tenant is:

count country          state                    city
----- -------          -----                    ----
46934
 2720 IT               Monza e Brianza          Monza
 2651 CN               Shanghai                 Shanghai
 2544 IN               Karnataka                Bangalore
 2399 CZ                                        Brno
 2228 US               North Carolina           Greenville
 2133 SG                                        Singapore
 2058 US               California               Remote

The number 46934 on the first line is the number of member users without a country, state and city.

Example 2

Another example. Retrieve all applications and service principals from the tenant. Find the applications that have a service principal.

Connect-MgGraph -Scopes "User.Read.All"

$applications = Get-MgApplication -All -Property "id,appId,displayName,signInAudience"
$applications.ToJsonString() | Out-File '$PSScriptRoot\apps.json' -Encoding utf8

$servicePrincipals = Get-MgServicePrincipal -All -Property "id,appId,displayName,servicePrincipalType,signInAudience"
$servicePrincipals.ToJsonString() | Out-File '$PSScriptRoot\sp.json' -Encoding utf8

Returns the applications that have a service principal.

$db = New-DuckDBConnection

$sql = @"
SELECT app.id, app.displayName
FROM '$PSScriptRoot\apps.json' app
SEMI JOIN '$PSScriptRoot\sp.json' sp
ON (app.appId = sp.appId)
ORDER BY app.displayName;
"@

$db.sql($sql) | Format-Table *
$db.closedb()

Semi join returns applications that have a service principal.

Part of the output:

id                                   displayName
--                                   -----------
47110732-2b81-4c52-a91e-d9315d9c456b sva-teamslocal
340e2670-6ec1-4dc7-b4fd-21ec6116b971 teams-botlocal
b2d38dab-50b5-44ae-bcf9-876671beec96 test
eef73758-88f1-4e40-9db8-99dada60ae48 test
4f90495d-cfd5-4910-a788-07577b2ab65d test
d3e28a8d-ab31-4521-97e7-68585f08c359 test-bot
f7e34a90-4dc8-4eb1-a5ce-b553ca19782f testbot-8f57

Example 3

Similar example, retrieve all applications that don't have a service principal.

$db = New-DuckDBConnection

$sql = @"
SELECT app.id, app.displayName
FROM '$PSScriptRoot\apps.json' app
ANTI JOIN '$PSScriptRoot\sp.json' sp
ON (app.appId = sp.appId)
ORDER BY app.displayName;
"@

$db.sql($sql) | Format-Table *
$db.closedb()

Anti join returns applications that don't have a service principal.

Part of the output:

id                                   displayName
--                                   -----------
4c6b1842-40ad-40bc-bd44-333be15a4a56 AD LCMS Platform bot (Power Virtual Agents)
332dd938-4914-4a67-9628-7dc8b70a4eb2 AD LCMS Project bot (Power Virtual Agents)
e50de004-8494-4c5a-a5a4-69de1aca9d79 ADAPT (DEV/Migrated)
fb735249-9c4f-4ea3-ad8c-6b91b692c04d ADAPT (Web API DEV/Migrated)
9692c5f9-f14b-49ed-b02e-50be1f7dfbbe ADD Identity Provider
144520ae-8959-4849-9a1f-57a66e8f9495 ADP
93165b49-22d3-427c-b219-026fe81c4dbe ADP (Linkouts)
31cbefc3-6276-40ef-88bd-21399147dd30 ADP GlobalView

Conclusion

With the combination of the Graph API and DuckDB, you can run analytical queries on data in your tenant without the need to install a database system.

You can use the Graph API to get data from your tenant and DuckDB to run analytical queries. The PSDuckDB module provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

1
Buy Me a Coffee at ko-fi.com
An error has occurred. This application may no longer respond until reloaded. Reload x