Quadratic Docs
  • Getting started
  • Examples
  • Open Quadratic
  • Quadratic AI
    • Getting started
    • Generate code
    • Generate data
    • Import PDFs
    • Import images
    • Security
  • Connections
    • SQL - getting started
    • PostgreSQL
    • MySQL
    • MS SQL Server
    • Parametrize queries
    • SQL AI assistant
    • Security
    • API requests
    • Request a connection
  • Python
    • Getting started
    • Reference cells
    • Return data to the sheet
    • Packages
    • Make an API request
    • Clean data
    • Charts/visualizations
    • Manipulate data
  • Javascript
    • Getting started
    • Reference cells
    • Return data to the sheet
    • API Requests
    • Charts/visualizations
    • Packages
  • Formulas
    • Getting started
    • AI assistant
    • Reference cells
    • Functions and operators
    • Arrays
    • Criteria
    • Wildcards
  • Spreadsheet
    • Navigating
    • Files
    • Shortcuts
    • Insert/delete rows and columns
    • Data validation
    • Present & share
    • Date-time formatting
    • Browser compatibility
  • Teams
    • Manage your team
    • Private files
    • Collaboration
    • Embedded sheets
  • Import data
    • SQL connections
    • API requests
    • Drag and drop .CSV
    • Drag and drop .Parquet
    • Import Excel files
  • Self hosting
    • Getting started
    • Docker
    • AWS
    • Azure
    • Google Cloud Platform
    • Bring your own AI
    • Other hosting
  • Quadratic for Education
    • Overview
    • Enrolling in the education plan
    • Teachers
    • Students
    • Researchers
    • Education FAQ
  • Company
    • About
    • Quadratic is source available
    • Brand assets
  • GitHub
  • Blog
  • Twitter
  • Discord
Powered by GitBook
On this page
  • 1. View select sections of your DataFrame in the sheet
  • 2. Drop specified columns
  • 3. Field-specific changes
  • 4. Clean columns
  • 5. Delete select rows
  • 6. Delete empty rows
  • 7. Change data types
  • 8. Remove duplicates

Was this helpful?

  1. Python

Clean data

Get your data ready for analysis.

PreviousMake an API requestNextCharts/visualizations

Last updated 7 months ago

Was this helpful?

Cleaning data in Quadratic is more seamless than you may be used to, as your data is viewable in the sheet as you step through your DataFrame. Every change to your DataFrame can be reflected in the sheet in real-time. Some data cleaning steps you may be interested in taking (very much non-exhaustive!):

1. View select sections of your DataFrame in the sheet

Assume DataFrame named df. With df.head() you can display the first x rows of your spreadsheet. With this as your last line the first x rows will display in the spreadsheet. You can do the same except with the last x rows via df.tail()

# Display first five rows
df.head(5)

# Display last five rows
df.tail(5)

2. Drop specified columns

Deleting columns point and click can be done by highlighting the entire column and pressing Delete. Alternatively, do this programmatically with the code below.

# Assuming DataFrame df, pick the columns you want to drop
columns_to_drop = ['Average viewers', 'Followers']

df.drop(columns_to_drop, inplace=True, axis=1)

3. Field-specific changes

There are many ways to make field-specific changes, but this list will give you some ideas.

# Replace row 7 in column 'Duration' with the value of 45
 df.loc[7, 'Duration'] = 45

4. Clean columns

Going column by column to clean specific things is best done programmatically.

# Specify things to replace empty strings to prep drop 
df['col1'].replace(things_to_replace, what_to_replace_with, inplace=True)

5. Delete select rows

With the beauty of Quadratic, feel free to delete rows via point and click; in other cases, you may need to do this programmatically.

# Knowing your row, you can directly drop via
df.drop(x)

# Select a specific index, then drop that index
x = df[((df.Name == 'bob') &( df.Age == 25) & (df.Grade == 'A'))].index
df.drop(x)

6. Delete empty rows

Identifying empty rows should be intuitive in the spreadsheet via point-and-click; in other cases, you may need to do this programmatically.

# Replace empty strings to prep drop 
df['col1'].replace('', np.nan, inplace=True)

# Delete where specific columns are empty 
df.dropna(subset=['Tenant'], inplace=True)

7. Change data types

By default, Quadratic inputs will be read as strings by Python code. Manipulate these data types as you see fit in your DataFrame.

# Specify column(s) to change data type
df.astype({'col1': 'int', 'col2': 'float'}).dtypes

# Common types: float, int, datetime, string

8. Remove duplicates

Duplicates are likely best removed programmatically, not visually. Save some time with the code below.

# Drop duplicates across DataFrame
df.drop_duplicates()

# Drop duplicates on specific columns 
df.drop_duplicates(subset=['col1'])

# Drop duplicates; keep the last 
df.drop_duplicates(subset=['col1', 'col2'], keep='last')
View select sections of your DataFrame in the sheet
Drop specified columns
Field-specific changes
Clean columns
Delete select rows
Delete empty rows
Change data types
Remove duplicates