BigQuery Data Types

Google BigQuery offers a rich set of data types to help you store and process structured, semi-structured, and nested data efficiently. Understanding these data types is key to designing optimized schemas and writing performant queries.

Numeric Data Types

  • INT64: A 64-bit integer used for whole numbers.
  • FLOAT64: A 64-bit floating-point number, useful for approximate calculations.
  • NUMERIC: A decimal type with up to 38 digits of precision, ideal for financial or exact calculations.
  • BIGNUMERIC: An extended precision decimal type supporting up to 76 digits.

String and Boolean Types

  • STRING: Variable-length Unicode text, perfect for names, descriptions, and other text data.
  • BOOL: Boolean values representing TRUE or FALSE.

Date and Time Types

  • DATE: Represents a calendar date without time.
  • DATETIME: Combines date and time without timezone context.
  • TIME: Represents time of day, independent of any date.
  • TIMESTAMP: Date and time with timezone, stored in UTC.

Complex and Semi-Structured Types

  • ARRAY: An ordered list of elements, all of the same type.
  • STRUCT: A container of named fields, similar to a record or JSON object.
  • GEOGRAPHY: Stores geographic data points, lines, and polygons in WKT or GeoJSON format.

JSON Data Type

  • JSON: Holds semi-structured JSON data, allowing for flexible schemas and dynamic data ingestion.

Best Practices

  • Use NUMERIC or BIGNUMERIC when precision is critical, such as in financial reports.
  • Take advantage of ARRAY and STRUCT to handle nested or repeated data without flattening your schema.
  • Use TIMESTAMP for time-sensitive data across time zones to avoid confusion.
  • When working with text, prefer STRING unless you specifically need to store raw binary using BYTES.

Conclusion

Choosing the right data type in BigQuery is crucial for achieving optimal performance, accuracy, and scalability. By understanding the strengths and limitations of each type, you can design schemas that are robust, flexible, and efficient for both current and future needs.