Difference between revisions of "ETL"

212 bytes added ,  17:19, 2 June 2023
no edit summary
(→‎Testing ETL jobs: Add schema-source comparison section)
Line 83: Line 83:


If there's a field in the schema that is supposed to be published ("dumped" in Marshmallow jargon) and is supposed to be loaded from the source file, but it can't be found in the source file, an error message will be printed to the console. If additionally the job is trying to push data to the CKAN datastore, an exception will be raised.
If there's a field in the schema that is supposed to be published ("dumped" in Marshmallow jargon) and is supposed to be loaded from the source file, but it can't be found in the source file, an error message will be printed to the console. If additionally the job is trying to push data to the CKAN datastore, an exception will be raised.
These checks are really helpful when writing/testing/modifying an ETL job, as they make it easy to find typos in field names or other errors that are preventing source data from getting to the output correctly.


== Deploying ETL jobs ==
== Deploying ETL jobs ==