Are your data warehouse metrics taking a long time to load or timing out? This article describes how PowerMetrics queries and retrieves data from your warehouse and includes practical tips for improving performance.
- The PowerMetrics - data warehouse query process
- Configuring warehouse data for optimal performance
- Cache clearing and warming for faster queries
The PowerMetrics - data warehouse query process
Let’s begin with a brief description of how data flows in a PowerMetrics - Data warehouse setup:
- In PowerMetrics, you enter service-specific information to create an account connection that’s used to connect to the data warehouse. Metrics are defined by connecting to columns within a table in the warehouse. Metric settings, such as formatting, aggregation, and naming are selected as you configure the metric.
- To visualize and analyze a metric, one or more metrics queries are sent to the Metrics Service in PowerMetrics via REST API.
- PowerMetrics translates the queries into vendor-specific SQL according to the metric definition and forwards them to the data warehouse to execute and retrieve data. Note: Requests will timeout after 3 minutes.
- Once the data is retrieved from the warehouse for a query, it’s cached for 24 hours, by default. You can, optionally, configure a different cache duration (the minimum is 5 minutes).
Note: Table metadata is also cached. If you add a new table to your warehouse, you’ll need to clear the cache to create a metric using the new data.
To get up-to-date data, you can either use a webhook request (recommended) to clear the cache or set a fixed cache expiration duration.
Go here to learn more about cache management. (This example refers to Azure Synapse Analytics but the process is the same regardless of data warehouse service.)
Configuring warehouse data for optimal performance
Query performance is highly dependent on how data is organized in the warehouse. To integrate effectively with PowerMetrics, your data needs to be configured to work well with metrics.
Configuration requirements and recommendations:
-
Group measures and dimensions that you want to use for the same metric into a single table or view. Individual measures (metrics) and their associated dimension(s) must be defined against a single table or view. Measures that share a common set of dimensions should be grouped into a single table or view. Measures that don’t have common dimension sets should not be grouped, as doing so can result in sparse tables and potentially lead to performance or data integrity issues.
Note: If your database uses a star schema, with facts (measures) and dimensions in separate tables, we recommend you create an additional view and group your facts and dimensions into a single table for each metric.
-
Include at least one date column and one measure column in each table. Multiple metrics can be grouped into a single view or table if they share a common date and other dimensions.
-
Index data columns. Indexing date and dimension columns will speed up the query process, especially if you have a large amount of data. Columns are typically indexed using a CREATE INDEX statement, although the syntax may differ slightly based on your SQL database type. Tip: Including non-indexed columns in the index for frequently-queried combinations can further boost performance.
-
Pre-aggregate data. Pre-aggregating transactional data to use a daily or higher granularity (depending on the metric) significantly reduces table size and improves query speed. As an example, if you have a revenue metric that’s tracking 1000 individual transactions per day, and you aggregate the total revenue to daily, you’ll reduce the data volume by 1000x.
Pre-aggregation may also involve applying aggregation to columns that won’t be used as a metric dimension.
-
Investigate service-specific techniques to improve query performance. For example, in BigQuery, scheduling queries to run on a recurring basis speeds up performance and can reduce costs. Learn how to schedule queries in a BigQuery warehouse. If your data is stored in Azure Synapse Analytics, we recommend configuring a pipeline to create a dataset.
-
Analyze SQL queries to fine tune performance. After creating a data warehouse metric, you can view and copy the SQL query that’s generated by PowerMetrics. Note: Only the owner of the connection profile can access the query.
The following example describes accessing the SQL query on a metric’s homepage, but it’s also available on dashboards and in Explorer:- In the left navigation sidebar, click Metrics to open the metric catalog.
- Click the metric you want to investigate.
- Click the 3-dot menu for a metric visualization on the homepage and select View queries.
- Under Queries, click the Copy button.
- Use the copied SQL in your data warehouse and run the SQL EXPLAIN process to help you understand the query execution plan. Using this information, look for ways to fine tune the query to improve its performance.
Cache clearing and warming for faster queries
PowerMetrics caches your queries for a user-configurable duration of time to help avoid duplicate queries, thereby reducing cost and load on your systems.
Setting up a cache clearing step in your ELT pipeline using a webhook request when data is updated in your data warehouse will help keep your data up to date and also reduce the number of queries.
Cache warming reduces the load on your database and shortens query run time, leading to improved performance and scalability. In PowerMetrics, you can set up published views of static dashboards that include metrics for the desired queries. Published views run regularly (approximately every 30 minutes) and can be used to pre-populate the cache and warm up the queries used by the dashboard. Learn more about published views.