Posts

Showing posts from March, 2024

AWS Cloud Services

Image
  We recently participated in a Customer Journey Analytics project focused on capturing user interactions from websites and feeding them into Kinesis data streaming. Subsequently, AWS Lambda functions were employed to process these events, initiating the execution of AWS Glue Jobs—a set of Python Scripts utilizing PySpark for data transformation. The processed data was then stored in Athena for further analysis. To facilitate data querying, numerous AWS Lambda functions were deployed. Finally, we developed an interactive dashboard presenting the analyzed data through graphs and reports. Data Capture and Streaming: Utilized Kinesis Data Streaming for capturing user interactions on websites in real-time. Implemented a robust pipeline to funnel user events into Kinesis for seamless data ingestion. Serverless Orchestration with AWS Lambda: Leveraged AWS Lambda functions to orchestrate subsequent data processing tasks. Triggered AWS Glue Jobs, implemented as Python scripts with PySpark,...

Airtable Insert, Update or Upsert by APIs

The challenge we were facing is the need to import thousands of records from an Excel file into Airtable. However, the issue arises because Airtable already contains thousands of records, requiring us to insert or update them based on matching the primary column. So we have implemented a fast and optimized way first. Retrieve all data from Airtable initially Match existing data with new data from the Excel file Split dataset for insertion or updating Utilize Airtable APIs to insert and update records in chunks of 10 records each Implement chunking because Airtable supports a maximum of 10 records per API request Account for potential duplicate records due to other users or scripts inserting duplicates We're encountering lingering performance challenges, such as delays in script execution, which may result in missing records from other sources if Airtable receives additional data in the meantime. We recently discovered an informative article highlighting Airtable's introduction...

Data Transformation

  Data Integration As data can be extracted from different data sources, so integrate different data entities into a single data entity is very crucial and complex step for us. This will include following steps Data Criteria: What data needs to be integrated Data Mapping: Define mapping, what data values of a data entity will be mapped with other data entity. Data Refining: Define data refining rules after integration, what data will not be required as there are chance of duplicated data entities, incomplete data entities or some corrupted data entities.