How to Query a JSON Column in Redshift

In Amazon Redshift, JSON data can be stored in columns of type SUPER, allowing you to work with semi-structured data within your relational database. In this tutorial, we'll explore how to query JSON columns and extract useful information with SQL functions in Redshift.

Step 1: Creating a Table with a JSON Column

First, let’s create a simple table that contains a column to store JSON data. In this case, we will create a table named customer_data with a column customer_info to store the JSON data.


CREATE TABLE customer_data (
    id INT,
    customer_info SUPER
);
                                

Step 2: Inserting JSON Data

Now, let’s insert a sample record with JSON data into the customer_info column. The SUPER type allows us to store any valid JSON format.


INSERT INTO customer_data (id, customer_info)
VALUES 
(1, '{"name": "John Doe", "email": "john@example.com", "orders": [{"id": 1, "total": 150}, {"id": 2, "total": 200}]}');
                                

Step 3: Querying JSON Data

Once your data is inserted, you can use Redshift's built-in JSON functions to query the SUPER column. The json_extract_path_text function allows you to extract specific values from the JSON structure. For example, you can extract the customer's name and email from the customer_info column.


SELECT
    id,
    json_extract_path_text(customer_info, 'name') AS name,
    json_extract_path_text(customer_info, 'email') AS email
FROM customer_data;
                                

Step 4: Querying Nested JSON Data

If your JSON data contains nested arrays or objects, you can navigate these structures using additional JSON functions. For instance, to extract the total amount of the first order:


SELECT
    id,
    json_extract_path_text(customer_info, 'orders', 0, 'total') AS first_order_total
FROM customer_data;
                                

Best Practices for Querying JSON Data

  • Use the json_extract_path_text function for extracting text data from JSON fields.
  • Consider using the json_parse function when working with large JSON objects or when you need to perform complex parsing operations.
  • Ensure your SUPER columns are indexed to improve query performance on large datasets.
  • Remember that Redshift does not support all JSON features available in PostgreSQL, so test your queries carefully.

By following these steps, you can efficiently query JSON data in Amazon Redshift and integrate it into your analytical workflows. Experiment with different JSON functions and explore the full potential of the SUPER column for your data analysis.