Data Sources with Multiple Pages

A REST/URL web service query may produce more records than the web service can return in a single response. When this occurs, the web service paginates the response, which means it returns the data in multiple pages and typically indicates the total number of records or pages in the first page of the response. To retrieve all the pages requires multiple queries. 

Note that Klipfolio does not support paginated data from web services that use cursor-based pagination. Because web services that use cursor-based pagination do not provide a total number of records or pages (they provide a pointer to the next query instead), there is no way for a Klipfolio formula to calculate the number of pages required to retrieve the full set of data.

Creating paginated data sources

To access multiple pages of data, you must create two data sources:

  1. A static data source query that returns the first page of data. The data returned by this query will contain a field, for example, 'total_pages' or 'total_records', that indicates the total number of pages or records returned by your query. The API can return only a subset at a time.
  2. A dynamic data source query that uses a variable (for example, pageNumber or recordNumber) to create multiple instances of the query, one for each page of data. You will have to create the variable and assign it a valid value before creating the dynamic data source. For details, see Working with dynamic data sources.

Aggregating paginated data in a formula

Because paginated data is contained in multiple data sources, the full set of records cannot be selected using a single data reference. Instead, the data must be aggregated using the MAP function.

NOTE multiple data source are created as the MAP function executes. If a large number of data sources are required, the formula may time out before the MAP function can complete. If this occurs, restructure your data source query to return less data.

Example 1

In the following formula:

  • total_pages is a field in the initial static data source
  • pageNumber is the variable (as the second parameter in the MAP, typed as a literal string) used in the dynamic data source query that specifies the page requested
  • dataPoint is a field in the dynamic data source

MAP( CUMULATIVE( REPEAT( 1, total_pages) ),

One at a time, this MAP formula assigns page numbers 1 to total_pages to the pageNumber variable (used in the dynamic data source query) and executes the reference to dataPoint in the dynamic data source. 

A few notes:

  • CUMULATIVE( REPEAT( 1, total_pages) ) returns 1, 2, 3...total_pages. You may need to wrap the NUMBERFORMAT function around the CUMULATIVE function to convert its results to integer:

         dataPoint )

  • The third parameter points to the required field in the dynamic data source. If this field returns more than one item, use the MAPFLAT function instead of the MAP function.


Example 2

Some web services use an "offset" parameter in their queries. For example, if a query returns 500 records per response and the initial data source response indicates a total of 1987 records, 4 queries are needed to retrieve all the data:

offset = 0 returns records 1 to 500
offset = 500 returns records 501 to 1000
offset = 1000 returns records 1001 to 1500
offset = 1500 returns records 1501 to 1987

In this case, the first parameter in the MAP requires some arithmetic.

MAP( ARRAY(0, ( CUMULATIVE( REPEAT( 1, FLOOR( total_records/500 ) ) ) * 500 ) ),
         dataPoint )

A few notes:

  • The first item in the ARRAY is 0 to handle the first offset.
  • ( CUMULATIVE( REPEAT( 1, FLOOR( total_records/500 ) ) ) * 500 ) calculates the remaining offsets.
  • FLOOR( total_records/500 ) calculates the number of pages of data minus 1 (because the first page uses offset=0), in this case, 3. CUMULATIVE returns 1, 2, 3 which is then multiplied by 500 to return 500, 1000, 1500.