In this article, you will find information and examples on the following topics associated with working with JSON and XML data sources.
- JSON and XML data structure
- Selecting a JSON or XML data source
- Using the Selection options in JSON and XML data
- XPath Expressions (with examples)
- XPath Axes
- Klipfolio XPath Functions for JSON data
- Special Cases
- Advanced Usage
About the structure of JSON and XML data
JSON and XML are hierarchical data formats.
Where tabular data is referenced by column and row, JSON and XML data is referenced by its path, which indicates its position in the hierarchy. This path, also called the XPath, describes the levels traversed to reach the elements. For example, in the following data source:
player/first_nameis the XPath of the
player/team/whais the XPath of the
JSON data is made up of these elements:
- objects: one or more field name and value pairs, indicated in Klipfolio by curly braces.
- arrays: lists of elements of the same type, indicated in Klipfolio by square brackets.
Selecting a JSON or XML data source
The first image below shows how to choose a data source in the formula editor. The second image shows how the XPath is displayed in the formula once the data source is chosen.
; outlined in purple are the data source reference and enclose the editable XPath, which is outlined in red.
; is only displayed when the XPath is being edited.
Some data sources use '@' as the first character in their field name (shown in this example) or as an attribute. It is important not to get them confused with the data source reference.
Using the Selection Options in JSON or XML data
When you select data from a JSON or XML data source, the selected field’s path, or XPath, is displayed in the formula editor. You can type the path or you can click on the Selection Options link.
In the following image, the XPath of the
first_name element is
By default, the values of all peer elements are selected (equivalent to selecting a column of data in a table). If you need to change this selection, click Selection Options, and select the required option:
- Select only this <field name> element. Only one specific <field_name> element is selected (equivalent to selecting a specific cell in Excel). The formula editor displays the specific path:
- Select all peer <field name> elements. All <field_name> elements at this level of the hierarchy are selected (equivalent to selecting a column of data in a table). The formula editor displays the path:
- Select all <field_name> elements. All <field_name> elements at any level of the hierarchy are selected:
- Select all <field_name> elements in this parent element. All <field_name> elements in this branch of the hierarchy are selected:
An XPath expression is used to select elements based on specified criteria.
Some common functions used in XPath expressions:
Example 1: select elements by matching a specific value
To select the last names of players on team Winnipeg:
- Select the last_name of a player:
@/player/last_nameThis will return all
- Type the expression,
/playerto select only the last names of players on team Winnipeg.
The resulting XPath:
This says: for all player elements with descendant team/name equal to Winnipeg, select the last_name element.
Example 2: select elements based on a field name
This example uses the contains function and the name function to select all elements based on a field name.
This says: for all /player elements, select any values where the field name contains: "name"
This expression returns Anders, Hedberg, Maurice, Richard, Rocket, Wayne, Gretzky, The Great One.
Example 3: selecting all descendant values
@/player/* selects all immediate descendant values in the player element.
The Great One
@/player/* selects all immediate descendant values (denoted by /*) in the second player element.
Example 4: select JSON elements whose names begin with non-alphabetic characters
JSON elements with field names that begin with non-alphabetic characters (as shown in the following data source) cannot be selected directly. Instead it is necessary to manually type the path in the formula editor.
Note: The purple
@ refers to the data source of the XPath. See selecting a JSON or XML data source.
These examples use the name function.
To select the @date field name in the following data source, type:
To select @date field in the first "day" element:
XPath axes (child, descendant, following-sibling, preceding-sibling) indicate the position of an element relative to another element in the hierarchy. For example,
- first_name, last_name, nickname, teams/name, teams/wha and teams/nhl are descendants of player.
- first_name, last_name and nickname are children of player.
- first_name, last_name, nickname and teams are siblings of each other, preceding-sibling or following-sibling depending on the relative positions (XML) or alphabetical order (JSON} of their names:
- first_name is a preceding-sibling of last_name.
- last_name is a following-sibling of first_name.
To select first_names for all players with a nickname child:
To select all first_names with a nickname following-sibling (this will return the same values as
@/player[child::nickname]/first_name ) :
Sometimes more complex XPath manipulation is required to access and align JSON data.
kf:element_at( object, index )
Select an element at the position specified by index. This function is used when a field name is unnamed and is instead referenced by number (position).
For example, using the following data source:
This function is used to “fill in” blanks for an element that is not included in every record. This is useful for data alignment.
For example, using the following data source:
@/player/first-name Returns: Anders, Maurice, Wayne
@/player/nickname Returns: Rocket, The Great One
As is, the data incorrectly aligns Anders with the nickname, Rocket, and Maurice with the nickname, The Great One.
kf:fill_elements to align this data correctly:
Returns: <blank>, Rocket, The Great One
Note: This function does not work for nested arrays.
kf:names function returns the names of all fields contained in an element. This is useful when the field names identify categories such as demographics or countries.
For example, Facebook can return page fans data by gender and age. To create a list of these categories for a table column for a bar/line chart x-axis, use:
Which returns data shown in the following image:
@kf:names(/data/values/value/M.25-34) which returns only value M.25-34
The following show more examples using the data source featured in the previous function (
@kf:names(/player/team) Returns: name, wha
@kf:names(/player) Returns: first_name, last_name, team
- Unnamed fields cannot be selected directly; see kf:element_at.
- Field names that begin with a non-alphabetic character cannot be selected directly; see the Example 4 in XPath Expressions.
Klipfolio functions like DATASOURCE and CONCAT can be used together for more flexible XPath editing.
Select elements matching a set of values
In the XPath Expressions section, the first example selected elements by matching a specific value, where
/player[team/name='Winnipeg']/last_name selected the last_name for all /player elements with descendant team/name equal to Winnipeg.
- Use CONCAT to manipulate the XPath so you can add a variable.
- In the XPath,
/player[team/name='Winnipeg']/last_name, replace "Winnipeg", with the variable (city).
If city= Winnipeg, this formula returns
If city= Montreal, this formula returns
- Then use DATASOURCE to specify which data source (using the data source ID) to apply this reference to.