Datalake Status - one more

The datalake has been running for the past couple of months and there has been no issues as of yet. The AWS Batch jobs are running as expected and the data is being sent to S3 correctly. AWS Glue runs over the data in the bucket every two weeks. I could easily have changed it to once every month, but I prefer it to run it every two weeks because I want to ensure that the crawling works fine.

What is next? Well, I need to figure out how to speed up the process. The conversion process using Python where it calls the database, retrieves the data, converts the data and then uploads it to S3 is taking too long. The process needs to be updated and made a whole lot faster. It is not because speed is really the problem here. I don’t really care if the whole proces takes 3 hours or 10 hours because it runs automatically without any interaction. However it would be interesting to see how I can speed it up using Rust. I see it like a learning process too.

Let the conversion process begin! I have already created a branch ;)