validate commands specifications were first written, the intent was to use JSON Table Schema – which is specifically designed for tabular data.
Of course, being active in the CKAN community, I considered using Frictionless Data. However, it was limited to Python.
After surveying the available crates that can be leverage to build these commands – it became clear that we had to use JSON Schema instead using the jsonschema crate.
validate is quite performant! Validating a million rows in less than 3 seconds.1
$ qsv validate .\NYC_311_SR_2010-2020-sample-1M.csv .\nyc50ksample.csv.schema.json [00:00:02] [==================== 100% validated 1,000,000 records.] (411,861/sec) Writing invalid/valid/error files... 2,995 out of 1,000,000 records invalid.