Categories

Get Started Now

14 File Types You Can Import Into pandas

As a data scientist or analyst, you’ll probably come across many file types to import and use in your Python scripts. Some analysts use Microsoft Excel, but the application limits what you can do with large data imports. The better option is pandas — a powerful analysis toolkit that’s much more intuitive for a data scientist.

What file formats can pandas use?

Python can handle virtually any data file format — much more than Microsoft Excel. That’s the strength of Python. It’s open source, and there’s probably a library out there to handle it, so you get a vastly more compatible system.

Python can work with the following file formats:

  1. Comma-separated values (CSV)
  2. XLSX
  3. ZIP
  4. Plain Text (txt)
  5. JSON
  6. XML
  7. HTML
  8. Images
  9. Hierarchical Data Format
  10. PDF
  11. DOCX
  12. MP3
  13. MP4
  14. SQL

How to install pandas

Installing pandas is easy. Open your terminal and use this code:

[code language="shell"]
pip install pandas
[/code]

For Python3 you’ll need to use:

[code language="shell"]
pip3 install pandas
[/code]

If you’re getting a sea of red errors, you might need to a sudo command and then enter your password:

sudo -H pip install pandas

How to import files in Python

With almost any data science project, you need to import files. pandas makes these imports intuitive, but even better it’s done with only a few lines of code.

How to import HTML with pandas

HTML is usually unstructured data, but you could have a page with a table of data that you want to import. Pandas reads HTML in just one line of code and parses the data for you, including the tag elements.

[code language="python"]
import pandas as pd
tables = pd.read_html('https://www.cbtnuggets.com/blog/2018/09/13-honest-data-center-technician-salaries-in-2018/')
tables.head()
[/code language]

NOTE: This is only for pages with tables. Pandas works its magic easily when a page has a table, like our 13 Honest Data Center Tech Salaries post.

If you want to do some serious scraping, you should look into Beautiful Soup.

How to import SQL with pandas

You could run a SQL query on a relational database and export data to a structured file format, or you can run a SQL statement directly in your Python scripts. Pandas has a function that takes a SQL string and runs it against the relational database.

It’s a little trickier because you need to write a SQL query, and also open a connection to a SQL database. Don’t let that discourage you. SQL is straightforward, and it’s easy once you get the connection set up with Python library pyodbc.

[code language="python"]
import pandas as pd
import pyodbc
cnxn = pyodbc.connect(connection_string)
sql = ("SELECT first_name, last_name FROM Customers")
data = pd.read_sql(sql, cnxn)
[/code]

You’ll definitely want to learn more about SQL connections in Ben Finkel’s pandas course.

How to import a CSV with pandas

Pandas also makes imports of a CSV easy.

[code language="python"]
import pandas as pd
data = pd.read_csv('mydata.csv', sep = ,)
[/code]

Importantly, you can put virtually any file format into the pd.read_[file format] section, and it’ll work the same way.

How to export formats

After you import data, you might perform calculations and arrange new data structures in your scripts. You might then want to export data to a new file that contains your calculations.

Pandas has a function that exports data to a CSV. You can export using other file formats, but most data scientists prefer to export in a file format that’s more easily transferred between systems. We mentioned that CSV is used to transfer data in an example environment between MySQL and SQL Server, but it’s used in other situations where you need to ensure that a third-party is able to import your file.

You might not know what system your client uses or need to export data for a third-party to use. Using a CSV file format, you can better ensure that any other software, database engine, or compiled code can import your information without incompatibility issues.

How to export a CSV with pandas

[code language="python"]
import pandas as pd
data.to_csv('example.csv')
[/code]

With this newly exported file, you or any third-party can import it to another system including another Python script. These few lines of code are all that it takes to import data, and pandas makes it possible.

How to convert any file to an Excel file with pandas

CSV file formats are one of the most common. A CSV file is used to transport data between two different platforms. For instance, a CSV file can be used to transfer data between SQL Server and MySQL. It’s a universal format that also requires very little formatting overhead.

How to convert a JSON file to a CSV with pandas

JSON files are structured for computers, not humans. Data analysts might need to get a better look at data in an easy-to-read format. With pandas, that’s easy.

[code language="python"]
import pandas as pd
import simplejson
df = pd.read_json(name.json')
df.to_csv('name.csv', index=False)
[/code]

Library tip: The simplejson library provides line-by-line error messages that make life much easier. For instance, if your .json file isn’t formatted correctly, simplejson will tell you exactly which line is faulty. The json library only tells you there’s an issue (or issues) somewhere in a huge, not-so-easy-to-read file.

You can convert virtually any data file format into any other format with pandas.

Conclusion

As a data scientist, you need to use the right libraries and file formats that help your career. When you become familiar with file formats and the functions that easily import and export data, you are a more versatile data scientist with extended skills that you bring to potential employers.

 

Not a CBT Nuggets subscriber? Start your free week now.

CBT Nuggets has everything you need to learn new IT skills and advance your career — unlimited video training and Practice Exams, Virtual Labs, validated learning with in-video Quizzes, Accountability Coaching, and access to our exclusive community of IT professionals.

Learn more about the CBT Nuggets Learning Experience.

 

Comments are closed.