Data Hooks Best Practices
Elisa Dinsmore avatar
Written by Elisa Dinsmore
Updated over a week ago

This article is for Flatfile's Portal 3.0 and Workspaces. If you'd like to check out the latest version of the Flatfile Data Exchange Platform, click here!

Data Hooks® are short snippets of code that will allow you to validate, manipulate, and format your data as it is being uploaded. Flatfile has designed the Platform SDK to allow users to quickly build reliable engines that import messy data.

A few notes about Data Hooks:

  • Data Hooks run on chunks of 1000 records.

  • Each data hook chunk runs in parallel.

  • All Data Hooks created in the Dashboard run simultaneously, while Data Hooks created in the Platform SDK run in order.

  • Our lambdas have a timeout of 5 minutes.

Best Practice Recommendations:

Here are the practices we recommend adopting to optimize how your data hooks run and minimize the errors you may encounter:

  • Use field hooks when possible (validate, compute, and cast) as these are designed for safety and reusability. For more information on the two types of hooks, visit our documentation.

  • Test, test, test! Testing is the fastest way to develop reliable Data Hooks. Take advantage of the testing options we offer with the Platform SDK to make sure your Data Hooks can handle different types of data and produce the results you're expecting.

  • When writing recordCompute or batchRecordsCompute, include null checks wherever applicable to ensure the hook doesn't fail if a value does not exist in a particular field.

  • If you are using dependencies, make sure the dependency is included in package.json or added in the UI.

  • Be mindful of where you are making network requests. In the Platform SDK these are only possible in batchRecordsCompute so you can batch your requests for better performance.

  • Use annotations to set info messages when values are changed by compute or default.

  • If you see an error message indicating "Unable to complete auto formatting" on the review stage, this indicates there was an error with one of your Data Hooks. If you can’t reproduce the error easily with tests, there may be something in the data imported against that sheet that is causing a Data Hook to fail, like a null value or completely invalid data.

  • If you are using a function that can error, surround it with try/catch! to avoid breaking other hooks. If one hook fails, other hooks may not be able to run and the effects can snowball.

  • For Data Hooks created via the Dashboard: strategically combine or separate operations based on validations that need to happen at the same time or in a specific order.

Guide to writing and testing Data Hooks in the Platform SDK:

  1. Start with a single sheet. Either start with a working example or a small subset of your desired import shape (no more than 10 columns) or a built in example. Deploy this. It is important to start with something that works, because then you have a solid base to build upon.

  2. Commit your code to your local git repository. If something stops working in the future (or you see unexpected behavior), you can compare your current codebase to the last known working state.

  3. Once you have that starting point deployed, write your first test in SheetTester and run the test locally. Commit again.

  4. Add all of your individual fields, along with compute functions and validations. Write tests for your sheet along the way. Commit each time you have a new set of tests that pass. Publish as necessary.

  5. Once you are happy with all of your field level hooks and validations (all transformations and validations that only depend on a single field) it is time to start working on your record level hooks.

  6. First work on recordCompute, which allows you to perform transformations and validations across multiple fields. Continue writing tests and committing regularly.

  7. Next, work on batchRecordsCompute to make any https api calls needed to add extra data to your sheet. Continue to test and commit regularly.

  8. Finally, work on egress.

Did this answer your question?