Quadratic Docs
Ask or search…
Comment on page

Clean data

Get your data ready for analysis.
Cleaning data in Quadratic is more seamless than you may be used to, as your data is viewable in the sheet as you step through your DataFrame. Every change to your DataFrame can be reflected in the sheet in real-time. Some data cleaning steps you may be interested in taking (very much non-exhaustive!):

1. View select sections of your DataFrame in the sheet

Assume DataFrame named df. With df.head() you can display the first x rows of your spreadsheet. With this as your last line the first x rows will display in the spreadsheet. You can do the same except with the last x rows via df.tail()
// display first five rows
// display last five rows

2. Drop specified columns

Deleting columns point and click can be done easily by highlighting the entire column and pressing Delete. Alternatively, do this programmatically with the code below.
// assuming DataFrame df
// pick the columns you want to drop
columns_to_drop = ['Average viewers', 'Followers']
df.drop(columns_to_drop, inplace=True, axis=1)

3. Field-specific changes

There are a million ways to make field-specific changes, but maybe this list will give you some ideas.
// replace row 7 in column 'Duration' with the value of 45
df.loc[7, 'Duration'] = 45

4. Clean columns

Going column by column to clean specific things is best done programmatically.
// specify things to replace empty strings to prep drop
df['col1'].replace(things_to_replace, what_to_replace_with, inplace=True)

5. Delete select rows

With the beauty of Quadratic, feel free just to delete rows via point and click; in other cases, you may need to do this programmatically.
// Knowing your row you can directly drop via following
// select a specific index, then drop that index
x = df[((df.Name == 'bob') &( df.Age == 25) & (df.Grade == 'A'))].index

6. Delete empty rows

Identifying empty rows should be intuitive in the spreadsheet via point-and-click; in other cases, you may need to do this programmatically.
// replace empty strings to prep drop
df['col1'].replace('', np.nan, inplace=True)
// delete where specific columns are empty
df.dropna(subset=['Tenant'], inplace=True)

7. Change data types

By default, Quadratic inputs will be read as strings by Python code. Manipulate these data types as you see fit in your DataFrame.
// specify column(s) to change data type
// common types: float, int, datetime, string
df.astype({'col1': 'int', 'col2': 'float'}).dtypes

8. Remove duplicates

Duplicates are likely best removed programmatically, not visually. Save some time with the code below.
// drop duplicates across DataFrame
// drop duplicates on specific columns
// drop duplicates, keep last
df.drop_duplicates(subset=['col1', 'col2'], keep='last')