When the schema
and validate
commands specifications were first written, the intent was to use JSON Table Schema – which is specifically designed for tabular data.
Of course, being active in the CKAN community, I considered using Frictionless Data. However, it was limited to Python.
After surveying the available crates that can be leverage to build these commands – it became clear that we had to use JSON Schema instead using the jsonschema crate.
And validate
is quite performant! Validating a million rows in less than 3 seconds.1
$ qsv validate .\NYC_311_SR_2010-2020-sample-1M.csv .\nyc50ksample.csv.schema.json
[00:00:02] [==================== 100% validated 1,000,000 records.] (411,861/sec)
Writing invalid/valid/error files...
2,995 out of 1,000,000 records invalid.
Still, as the jsonschema crate is still evolving, qsv will also support JSON Table Schema if and when it becomes doable/available in Rust.