September 30, 2022

Robotic Notes

All technology News

Databricks cross-workspace administration with Powershell and WPF – Java Code Geeks

2 min read


For quite some time now, I have worked in Data Analytics platforms in the cloud, where Databricks played a major role. Databricks is a cross-cloud product (AWS, Azure, GCP) that was developed by the creators of Apache Spark, and provides a user interface where notebooks can be shared and clusters managed. The Premium version, which is widely used, also includes, among other features, user management and role-based access controls (RBAC) for notebooks, clusters, jobs and tables.

Even a minimalistic Databricks setup would include at least 3 workspaces: Dev, Test and Prod. In practice there are way more workspaces created, as some will be specifically set up for Engineers, others for Data Scientists, Analysts, all that across all environments. A dozen different workspaces spread across all Cloud environments and subscriptions is quite common.

Challenges & Solutions

With that in mind, challenges quickly arise on how to administer the different user groups and audit all these workspaces, as Databricks’ own UI is workspace-specific. This is where Powershell scripts can help. However, command-line scripts can only go so far. What would be needed is a cross-workspace GUI to help with administration and oversight. Enter WPF (Windows Presentation Foundation), where the UI is written in XAML, and the various widgets (WPF controls) can be dragged & dropped using Visual Studio’s graphic designer. All that is needed then is to integrate Powershell with WPF / XAML, while keeping code and GUI separate:

In the XAML file, we use the x: Name attribute. For example, to process an ‘Add’ button:

<Button x:Name="Add" Content="Add" --/>

In the Powershell file, we’ll generate a variable named var_Add, to process the Add button event:

# ----------------WPF-XAML WINDOW UI----------------------------------

Add-Type -AssemblyName PresentationCore,PresentationFramework,WindowsBase,system.windows.forms

$xamlFile=".filesMainWindow.xaml"

#create window
$inputXML = Get-Content $xamlFile -Raw -Force
$inputXML = $inputXML -replace 'mc:Ignorable="d"', '' -replace "x:N", 'N' -replace '^<Win.*', '<Window'
[XML]$xaml = $inputXML

#Read XAML
$reader = (New-Object System.Xml.XmlNodeReader $xaml)
try {
    $window = [Windows.Markup.XamlReader]::Load( $reader )
} catch {
    Write-Warning $_.Exception
    throw
}

# Create variables based on form control names.
# Variable will be named as 'var_<control name>'

$xaml.SelectNodes("//*[@Name]") | ForEach-Object {
    #"trying item $($_.Name)"
    try {
        Set-Variable -Name "var_$($_.Name)" -Value $window.FindName($_.Name) -ErrorAction Stop
    } catch {
        throw
    }
}
Get-Variable var_*  | Out-Null

Same logic works for all WPF controls that we would want to process with Powershell. Then we would handle the ‘Add’ button click event this way:

$var_Add.Add_Click( {

# process here

})

Now that we know how to create a Powershell-based UI tool, what is left is to use the Databricks REST API, either directly, or through already existing Databricks Powershell modules that call the REST API.

We end up with a tree-like design that goes from retrieving the Cloud subscriptions, to listing all the Databricks workspaces, groups, users, clusters, etc. For the Azure Cloud, it will look like the following:

This design is the basis for creating DBX-Admina Databricks cross-workspace administration and auditing tool:

DBX-Admin Main Window

The tool has been open-sourced and is available on GitHub. Fell free to use / contribute.



Source link