This article explains the data schema and format in Clickstream Analytics on AWS. This solution uses an event-based data model to store and analyze clickstream data. Every activity (such as click and view) on the clients is modeled as an event with dimensions, and each dimension represents a parameter of the event. Dimensions are common for all events.
You can use JSON objects to store custom event parameters as key-value pairs into special dimensions, which helps you to collect information that is specific for your business. Those JSON objects are stored in special data types, which allow you to extract the values in the analytics engines.
For each project, the solution creates a database with name of in Amazon Redshift and Athena. Each app will have a schema with name of app_id . In Athena, all tables are added partitions of app_id , year, month, and day. Based on the event data, the solution's data-processing module creates the following four base tables:
Each column in the tables represents a specific parameter for an event, user, or item. Some parameters are nested within a Super field in Amazon Redshift or a Map field in Athena. Those fields (such as custom_parameters , user_properties ) contain parameters that are repeatable. The following table describes the fields.
Field Name | Data Type - Redshift | Data Type - Athena | Description |
---|---|---|---|
event_timestamp | TIMESTAMP | TIMESTAMP | The timestamp (in microseconds, UTC) when the event was logged on the client. |
event_id | VARCHAR | STRING | Unique ID for the event. |
event_time_msec | BIGINT | BIGINT | The time in UNIX timestamp format (microseconds) when the event was logged on the client. |
event_name | VARCHAR | STRING | The name of the event. |
event_value | DOUBLE PRECISION | FLOAT | The value of the event's "value" parameter. |
event_value_currency | VARCHAR | STRING | The currency of the value associated with the event. |
event_bundle_sequence_id | BIGINT | BIGINT | The sequential ID of the bundle in which these events were uploaded. |
ingest_timestamp | BIGINT | BIGINT | Timestamp offset between collection time and upload time in micros. |
device.mobile_brand_name | VARCHAR | STRING | The device brand name. |
device.mobile_model_name | VARCHAR | STRING | The device model name. |
device.manufacturer | VARCHAR | STRING | The device manufacturer name. |
device.carrier | VARCHAR | STRING | The device network provider name. |
device.network_type | VARCHAR | STRING | The network_type of the device, e.g., WIFI, 5G |
device.operating_system | VARCHAR | STRING | The operating system of the device. |
device.operating_system_version | VARCHAR | STRING | The OS version. |
device.vendor_id | VARCHAR | STRING | IDFV (present only if IDFA is not collected). |
device.advertising_id | VARCHAR | STRING | Advertising ID/IDFA. |
device.system_language | VARCHAR | STRING | The OS language. |
device.time_zone_offset_seconds | BIGINT | BIGINT | The offset from GMT in seconds. |
device.ua_browser | VARCHAR | STRING | The browser in which the user viewed content, derived from User Agent string |
device.ua_browser_version | VARCHAR | STRING | The version of the browser in which the user viewed content, derive from User Agent |
device.ua_device | VARCHAR | STRING | The device in which user viewed content, derive from User Agent. |
device.ua_device_category | VARCHAR | STRING | The device category in which user viewed content, derive from User Agent. |
device.ua_os | VARCHAR | STRING | The operating system of the device in which user viewed content, derive from User Agent. |
device.ua_os_version | VARCHAR | STRING | The operating system version of the device category in which user viewed content, derive from User Agent. |
device.ua | SUPER | MAP | The parsed User Agent in key-value pairs |
device.screen_width | VARCHAR | STRING | The screen width of the device. |
device.screen_height | VARCHAR | STRING | The screen height of the device. |
device.viewport_width | VARCHAR | STRING | The screen width of the browser viewport. |
device.viewport_height | VARCHAR | STRING | The screen height of the browser viewport. |
geo.continent | VARCHAR | STRING | The continent from which events were reported, based on IP address. |
geo.sub_continent | VARCHAR | STRING | The subcontinent from which events were reported, based on IP address. |
geo.country | VARCHAR | STRING | The country from which events were reported, based on IP address. |
geo.region | VARCHAR | STRING | The region from which events were reported, based on IP address. |
geo.metro | VARCHAR | STRING | The metro from which events were reported, based on IP address. |
geo.city | VARCHAR | STRING | The city from which events were reported, based on IP address. |
geo.locale | VARCHAR | STRING | The locale information obtained from device. |
traffic_source_name | VARCHAR | STRING | Name of the marketing campaign that acquired the user when the events were reported. |
traffic_source_medium | VARCHAR | STRING | Name of the medium (paid search, organic search, email, etc.) that acquired the user when the events were reported. |
traffic_source_campaign | VARCHAR | STRING | The marketing campaign (derive from utm_campaign) associated with the event. |
traffic_source_content | VARCHAR | STRING | The marketing campaign content (derive from utm_content) associated with the event. |
traffic_source_term | VARCHAR | STRING | The marketing campaign term (derive from utm_term) associated with the event. |
traffic_source_campaign_id | VARCHAR | STRING | The marketing campaign id (derive from utm_id) associated with the event. |
traffic_source_clid | VARCHAR | STRING | The click id associated with the event. |
traffic_source_clid_platform | VARCHAR | STRING | The platform of the click id associated with the event. |
traffic_source_channel_group | VARCHAR | STRING | The channel group (assigned by traffic classification rules) associated with the event. |
traffic_source_category | VARCHAR | STRING | The source category (i.e., Search, Social, Video, Shopping) based on the traffic source associated with the event. |
user_first_touch_time_msec | BIGINT | BIGINT | The time in UNIX timestamp format (microseconds) when the user first touch the app or website. |
app_package_id | VARCHAR | STRING | The package name or bundle ID of the app. |
app_version | VARCHAR | STRING | The app's versionName (Android) or short bundle version. |
app_title | VARCHAR | STRING | The app's name. |
app_id | VARCHAR | STRING | The App ID (created by this solution) associated with the app. |
app_install_source | VARCHAR | STRING | The store from which user installed the app. |
platform | VARCHAR | STRING | The data stream platform (Web, IOS or Android) from which the event originated. |
project_id | VARCHAR | STRING | The project id associated with the app. |
screen_view_screen_name | VARCHAR | STRING | The screen name associated with the event. |
screen_view_screen_id | VARCHAR | STRING | The screen class id associated with the event. |
screen_view_screen_unique_id | VARCHAR | STRING | The unique screen id associated with the event |
screen_view_previous_screen_name | VARCHAR | STRING | The previous unique screen id associated with the event. |
screen_view_previous_screen_id | VARCHAR | STRING | The previous unique screen id associated with the event. |
screen_view_previous_screen_unique_id | VARCHAR | STRING | The previous unique screen id associated with the event. |
screen_view_entrances | BOOLEAN | BOOLEAN | Whether the screen is the entrance view of the session. |
page_view_page_referrer | VARCHAR | STRING | The referrer page url. |
page_view_page_referrer_title | VARCHAR | STRING | The referrer page title. |
page_view_previous_time_msec | BIGINT | BIGINT | The timestamp of the previous page_view event. |
page_view_engagement_time_msec | BIGINT | BIGINT | The previous page_view duration in milliseconds. |
page_view_page_title | VARCHAR | STRING | The title of the webpage associated with the event. |
page_view_page_url | VARCHAR | STRING | The url of the webpage associated with the event. |
page_view_page_url_path | VARCHAR | STRING | The url path of the webpage associated with the event. |
page_view_page_url_query_parameters | SUPER | MAP | The query parameters in key-value pairs of the page url associated with the event. |
page_view_hostname | VARCHAR | STRING | The host name of the web page associated with the event. |
page_view_latest_referrer | VARCHAR | STRING | The url of the latest external referrer. |
page_view_latest_referrer_host | VARCHAR | STRING | The hostname of the latest external referrer. |
page_view_entrances | BOOLEAN | BOOLEAN | Whether the page is the entrance view of the session. |
app_start_is_first_time | BOOLEAN | BOOLEAN | Whether the app start is a new app launch. |
upgrade_previous_app_version | VARCHAR | STRING | Previous app version before app upgrade event. |
upgrade_previous_os_version | VARCHAR | STRING | Previous os version before OS upgrade event. |
search_key | VARCHAR | STRING | The name of the keyword in the URL when user perform search on web site. |
search_term | VARCHAR | STRING | The search content in the URL when user perform search on web site. |
outbound_link_classes | VARCHAR | STRING | The content of class in tag that associated with the outbound link. |
outbound_link_domain | VARCHAR | STRING | The domain of href in tag that associated with the outbound link. |
outbound_link_id | VARCHAR | STRING | The content of id in tag that associated with the outbound link. |
outbound_link_url | VARCHAR | STRING | The content of href in tag that associated with the outbound link. |
outbound_link | BOOLEAN | BOOLEAN | Whether the link is outbound link or not. |
user_engagement_time_msec | BIGINT | BIGINT | The user engagement duration in milliseconds. |
user_id | VARCHAR | STRING | The unique ID assigned to a user through setUserId() API. |
user_pseudo_id | VARCHAR | STRING | The pseudonymous id generated by SDK for the user. |
session_id | VARCHAR | STRING | The session id associated with the event. |
session_start_time_msec | BIGINT | BIGINT | The start time in UNIX timestamp of the session. |
session_duration | BIGINT | BIGINT | The duration the session lasts, in milliseconds. |
session_number | BIGINT | BIGINT | Number of the sessions generated from the client. |
scroll_engagement_time_msec | BIGINT | BIGINT | The engagement time on the web page until user scroll. |
sdk_error_code | VARCHAR | STRING | The error code generated by SDK when an event is invalid in some way. |
sdk_error_message | VARCHAR | STRING | The error message generated by SDK an event is invalid in some way. |
sdk_version | VARCHAR | STRING | The version of the SDK. |
sdk_name | VARCHAR | STRING | The name of the SDK. |
app_exception_message | VARCHAR | STRING | The exception message when the app crashes or throws an exception. |
app_exception_stack | VARCHAR | STRING | The exception stack trace when the app crashes or throws an exception. |
custom_parameters_json_str | VARCHAR | STRING | All the custom event parameters stored in key-value pairs. |
custom_parameters | SUPER | MAP | All the custom event parameters stored in key-value pairs. |
process_info | SUPER | MAP | Store information about the data processing. |
created_time | TIMESTAMP | TIMESTAMP | Store information about the data processing. |
Field Name | Data Type - Redshift | Data Type - Athena | Description |
---|---|---|---|
event_timestamp | BIGINT | STRING | The timestamp of when the user attributes was collected. |
user_id | VARCHAR | STRING | The unique ID assigned to a user through setUserId() API. |
user_pseudo_id | VARCHAR | STRING | The pseudonymous id generated by SDK for the user. |
user_properties | SUPER | ARRAY | Properties of the user. |
user_properties_json_str | VARCHAR | STRING | Properties of the user. |
first_touch_timestamp | BIGINT | BIGINT | The time (in microseconds) at which the user first opened the app or visited the site. |
first_visit_date | Date | Date | Date of the user's first visit. |
first_referer | VARCHAR | STRING | The first referer detected for the user. |
first_traffic_source | VARCHAR | STRING | The the network source that acquired the user that was first detected for the user, e.g., Google, Baidu |
first_traffic_source_medium | VARCHAR | STRING | The medium of the network source that acquired the user that was first detected for the user, e.g., paid search, organic search, email, etc |
first_traffic_source_campaign | VARCHAR | STRING | The name of the marketing campaign that acquired the user that was first detected for the user. |
first_traffic_source_content | VARCHAR | STRING | The marketing campaign content that acquired the user that was first detected for the user. |
first_traffic_source_term | VARCHAR | STRING | The keyword of the marketing ads that acquired the user that was first detected for the user. |
first_traffic_source_campaign_id | VARCHAR | STRING | The id of the marketing campaign that acquired the user that was first detected for the user. |
first_traffic_source_clid_platform | VARCHAR | STRING | The click id platform of the marketing campaign that acquired the user that was first detected for the user. |
first_traffic_source_clid | VARCHAR | STRING | The click id of the marketing campaign that acquired the user that was first detected for the user. |
first_traffic_source_channel_group first_traffic_source_category | VARCHAR | STRING | The channel group of the traffic source that acquired the user that was first detected for the user. |
first_app_install_source | VARCHAR | STRING | The source category (i.e., Search, Social, Video, Shopping) based on the traffic source that acquired the user for the first time. |
process_info | SUPER | MAP | The install channel for the user, e.g., Google Play Store information about the data processing. |
created_time | TIMESTAMP | TIMESTAMP | Store information about the data processing. |
Field Name | Data Type - Redshift | Data Type - Athena | Description |
---|---|---|---|
event_timestamp | TIMESTAMP | STRING | The timestamp of when the event occurred. |
user_pseudo_id | VARCHAR | STRING | The pseudonymous ID generated by the SDK for the user. |
session_id | VARCHAR | STRING | The ID assigned to a session. |
user_id | VARCHAR | STRING | The unique ID assigned to a user through the setUserId() API. |
session_number | BIGINT | INT | The sequence number of the session in the client. |
session_start_time_msec | BIGINT | BIGINT | The start time of the session in milliseconds. |
session_source | VARCHAR | STRING | The traffic source of the session. |
session_medium | VARCHAR | STRING | The traffic source medium of the session. |
session_campaign | VARCHAR | STRING | The traffic source campaign of the session. |
session_content | VARCHAR | STRING | The traffic source content of the session. |
session_term | VARCHAR | STRING | The traffic source term of the session. |
session_campaign_id | VARCHAR | STRING | The traffic source campaign ID of the session. |
session_clid_platform | VARCHAR | STRING | The platform of the CLID (Click ID) of the session. |
session_clid | VARCHAR | STRING | The CLID (Click ID) of the session. |
session_channel_group | VARCHAR | STRING | The traffic source channel group of the session. |
session_source_category | VARCHAR | STRING | The traffic source category of the session source. |
process_info | SUPER | MAP | Additional data processing information. |
created_time | TIMESTAMP | STRING | The timestamp of when the session data was created. |
Field Name | Data Type - Redshift | Data Type - Athena | Description |
---|---|---|---|
event_timestamp | TIMESTAMP | STRING | The timestamp of when the event occurred. |
event_id | VARCHAR | STRING | The ID of the event. |
event_name | VARCHAR | STRING | The name of the event. |
platform | VARCHAR | STRING | The platform associated with the event. |
user_pseudo_id | VARCHAR | STRING | The pseudonymous ID generated by the SDK for the user. |
user_id | VARCHAR | STRING | The unique ID assigned to a user through the setUserId() API. |
item_id | VARCHAR | STRING | The ID of the item. |
name | VARCHAR | STRING | The name of the item. |
brand | VARCHAR | STRING | The brand of the item. |
currency | VARCHAR | STRING | The currency associated with the item price. |
price | DOUBLE PRECISION | DOUBLE | The price of the item. |
quantity | DOUBLE PRECISION | DOUBLE | The quantity of the item in the event. |
creative_name | VARCHAR | STRING | The name of the creative associated with the item. |
creative_slot | VARCHAR | STRING | The slot of the creative associated with the item. |
location_id | VARCHAR | STRING | The ID of the location associated with the item. |
category | VARCHAR | STRING | The category of the item. |
category2 | VARCHAR | STRING | The second category of the item. |
category3 | VARCHAR | STRING | The third category of the item. |
category4 | VARCHAR | STRING | The fourth category of the item. |
category5 | VARCHAR | STRING | The fifth category of the item. |
custom_parameters_json_str | VARCHAR | STRING | The JSON string representation of custom parameters. |
custom_parameters | SUPER | MAP | Additional custom parameters. |
process_info | SUPER | MAP | Additional process information. |
created_time | TIMESTAMP | STRING | The timestamp of when the item data was created. |