Comment on page
Clean data
Get your data ready for analysis.
Cleaning data in Quadratic is more seamless than you may be used to, as your data is viewable in the sheet as you step through your DataFrame. Every change to your DataFrame can be reflected in the sheet in real-time. Some data cleaning steps you may be interested in taking (very much non-exhaustive!):
Assume DataFrame named
df
. With df.head()
you can display the first x rows of your spreadsheet. With this as your last line the first x rows will display in the spreadsheet. You can do the same except with the last x rows via df.tail()
// display first five rows
df.head(5)
// display last five rows
df.tail(5)
Deleting columns point and click can be done easily by highlighting the entire column and pressing
Delete
. Alternatively, do this programmatically with the code below. // assuming DataFrame df
// pick the columns you want to drop
columns_to_drop = ['Average viewers', 'Followers']
df.drop(columns_to_drop, inplace=True, axis=1)
There are a million ways to make field-specific changes, but maybe this list will give you some ideas.
// replace row 7 in column 'Duration' with the value of 45
df.loc[7, 'Duration'] = 45
Going column by column to clean specific things is best done programmatically.
// specify things to replace empty strings to prep drop
df['col1'].replace(things_to_replace, what_to_replace_with, inplace=True)
With the beauty of Quadratic, feel free just to delete rows via point and click; in other cases, you may need to do this programmatically.
// Knowing your row you can directly drop via following
df.drop(x)
// select a specific index, then drop that index
x = df[((df.Name == 'bob') &( df.Age == 25) & (df.Grade == 'A'))].index
df.drop(x)
Identifying empty rows should be intuitive in the spreadsheet via point-and-click; in other cases, you may need to do this programmatically.
// replace empty strings to prep drop
df['col1'].replace('', np.nan, inplace=True)
// delete where specific columns are empty
df.dropna(subset=['Tenant'], inplace=True)
By default, Quadratic inputs will be read as strings by Python code. Manipulate these data types as you see fit in your DataFrame.
// specify column(s) to change data type
// common types: float, int, datetime, string
df.astype({'col1': 'int', 'col2': 'float'}).dtypes
Duplicates are likely best removed programmatically, not visually. Save some time with the code below.
// drop duplicates across DataFrame
df.drop_duplicates()
// drop duplicates on specific columns
df.drop_duplicates(subset=['col1'])
// drop duplicates, keep last
df.drop_duplicates(subset=['col1', 'col2'], keep='last')
Last modified 22h ago