Seven Free Data Wrangling Tools

Reformatting, de-duping, merging, and filtering are just some of the functions that go under the broad category of data wrangling. It’s all the scrubbing and cleaning that data scientists apply...
Michael Buckbee
2 min read
Last updated February 25, 2022

Reformatting, de-duping, merging, and filtering are just some of the functions that go under the broad category of data wrangling. It’s all the scrubbing and cleaning that data scientists apply to raw data before it’s ready for real analysis. Even The New York Times ran an article about this less glamorous side of Big Data, referring to wrangling as ‘janitor work’.

We prefer to see it instead as an important first step to understanding and gaining insights into the datasets you’ll be dealing with. However, this doesn’t have to be a tedious manual task.

Get the Free PowerShell and Active Directory Essentials Video Course

There are some great free tools available that can make this part of data science less of a chore. Cindy and I scoured the Internet to put together the following list of powerful wrangleware.

1. Tabula

Ever had to convert a table data embedded in a PDF into a spreadsheet? There should be a better way to do this than pasting raw PDF into Excel, and then spending hours forcing the messy data into the right columns. The very smart Tabula does this task automatically. It’s available as a Github project. It’s great for marketers, data journalists, financial analysts, as well as data scientists.

2. OpenRefine

OpenRefine was a Google code project that now lives on as open source software. Its friendly GUI is very good at letting you describe and then manipulate data. It was meant for non-data scientist to use directly, but it has a powerful set of programmable expressions for more sophisticated tasks.

3. “R” packages

R is an important programming language for data scientists. It has serious support of statistical and probability functions, and excels at handling slabs of numeric data, unlike general purpose languages. R can be extended through a series of libraries or packages so you don’t have to reinvent the data wrangle wheel. R programmers have used the functions in the popular dlpr and tidyr packages to help them tame unruly data. There’s a good overview of how to wrangle with R, courtesy of the folks at ComputerWorld.

If you’re new to R and want to give it a try, there’s a great interactive tutorial over at Code School.

data-wrangle
Wrangle data visually with DataWrangler.

4. DataWrangler

Highly recommended by top analysts, visualizers and data scientists, DataWrangler is an interactive tool for data cleaning. It takes messy, real-world data and transforms it into data tables. Then you can export to Excel, Tableau, R, etc. The goal: spend less time manually formatting and more time analyzing your data.

5. CSVKit

csvkit can help convert data – from Excel to CSV, JSON to CSV, Query with SQL and much more! Simply put, csvkit will make your data wrangling life easier.

6. Python and Pandas

Python of course is an excellent language for data manipulation. Add on the Pandas library, which includes its DataFrame object, and data scientists can quickly perform even more complex operations. For example, merging, joining, and transforming huge hunks of data with a single Python statement.

7. Mr. Data Converter

Mr. Data Converter is straight forward – it takes Excel data and transforms it to web-friendly formats like HTML, JSON and XML.

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

1

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

2

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

3

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Try Varonis free.

Get a detailed data risk report based on your company’s data.
Deploys in minutes.

Keep reading

Varonis tackles hundreds of use cases, making it the ultimate platform to stop data breaches and ensure compliance.

2019-data-risk-report-stats-and-tips-you-won’t-want-to-miss
2019 Data Risk Report Stats and Tips You Won’t Want to Miss
Our data risk report analyzed over 54 billion files across 30+ industries for the latest insights, stats and tips to improve your data security practices
the-2021-manufacturing-data-risk-report-reveals-1-in-5-files-is-open-to-all-employees
The 2021 Manufacturing Data Risk Report Reveals 1 in 5 Files is Open to All Employees
Threats against the manufacturing sector continue — from big game ransomware groups that steal victim’s data before encrypting it, to nation-state attackers seeking technology secrets, to company insiders looking for…
the-2021-financial-data-risk-report-reveals-every-employee-can-access-nearly-11-million-files
The 2021 Financial Data Risk Report Reveals Every Employee Can Access Nearly 11 Million Files
Financial services organizations must safeguard tons of highly sensitive information, but data is often left exposed to far too many people. If just one employee clicks on a phishing email,...
the-2021-healthcare-data-risk-report-reveals-1-out-of-every-5-files-is-open-to-all-employees
The 2021 Healthcare Data Risk Report Reveals 1 Out of Every 5 Files is Open to All Employees
The Varonis 2021 Healthcare Data Risk Report found 20% of all files at hospitals, biotech firms and pharma companiesare open to every employee.