Difference between revisions of "ETL"

2,200 bytes added ,  14:06, 22 April 2022
(Add timestamp pitfalls)
Line 69: Line 69:


== Deploying ETL jobs ==
== Deploying ETL jobs ==
(To be written.)
Once tested, an ETL job can be deployed by 1) moving the source code for the ETL job to a production server and 2) scheduling the job to run automatically.
 
Assuming that you are developing the ETL job on a separate computer and in a dev branch of <code>rocket-etl</code>, this is a typical deployment workflow:
# Use <code>> git add -p</code> to construct atomic commits (each of which should thematically cluster changes) and <code>> git commit -m "<Meangingful commit description>")</code> to commit them. Repeat until all the code that needs to be deployed has been committed. If you need to add a new file (like "sky_maintenance.py"), try <code>>git add sky_maintenance.py</code> and <code> > git commit -m "Add ETL job for sky-maintenance data".
# If you have any other changes to your dev branch that aren't ready for deployment, type <code>> git stash save</code> to temporarily stash those changes (so you can switch to the <code>master</code> branch).
# <code>> git checkout master</code> lets you switch to the <code>master</code> branch.
# <code>> git merge dev</code> merges the changes committed to the <code>dev</code> branch into the <code>master</code> branch.
# Push the changes to GitHub: <code>> git push</code>
# Switch back to the <code>dev</code> branch: <code>> git checkout dev</code>
# Restore the stashed code: <code>> git stash pop</code>
# Shell into the production server with <code>ssh</code>.
# Navigate to wherever the <code>rocket-etl</code> directory is.
# Pull the changes from GitHub: <code>> git pull</code>
# At this point, it's usually best to test the ETL job to make sure it will work in the production environment. Either the <code>test</code> or <code>to_file</code> command-line parameters can be used if you're not ready to publish data to the production dataset. Failure at this stage usually means that some code or parameter that was supposed to be committed to the git repository didn't get committed or is not defined on the production server.
# Schedule the job by writing a cron job: <code>> crontab -e</code> + duplicate a launchpad line that's already in the crontab file + edit it to run the new ETL job and edit the schedule to match the desired ETL schedule.
[[Category:Onboarding]]
[[Category:Onboarding]]