API Reference - ML Audit

AuditTrialRecorder(df, name=None)

Initialize the recorder with a pandas DataFrame.

df (pd.DataFrame) The initial dataframe to track.

name (str, optional) Name of the experiment/audit trail. Defaults to a timestamped name.

Example

from ml_audit import AuditTrialRecorder
auditor = AuditTrialRecorder(df, name="my_experiment")

.impute(column, strategy='mean', fill_value=None, method=None)

Fill missing values in one or more columns using statistical strategies or specific methods.

column (str | list) Column name(s) to impute.

strategy (str) 'mean', 'median', 'mode', or 'constant'.

method (str) 'ffill' or 'bfill'. Overrides strategy if provided.

fill_value (any) Value to use when strategy='constant'.

Example

auditor.impute(["age", "salary"], strategy='median')
auditor.impute("stock", method='ffill')

.scale(column, method='standard')

Scale numerical features to a specific range or distribution.

column (str | list) Column name(s) to scale.

method (str) 'standard' (default), 'minmax', 'robust', 'maxabs'.

Example

auditor.scale(["height", "weight"], method='standard')

.encode(column, method='onehot', target_col=None)

Encode categorical features.

column (str | list) Column name(s) to encode.

method (str) 'onehot', 'label', or 'target'.

target_col (str) Required only for 'target' encoding.

Example

auditor.encode("color", method='onehot')
auditor.encode("zip", method='target', target_col='price')

.transform(column, func='log')

Apply mathematical transformations to columns.

column (str | list) Column name(s) to transform.

func (str) 'log' (log1p), 'sqrt', 'cbrt', 'square'.

Example

auditor.transform("income", func='log')

.bin_numeric(column, bins=5, strategy='quantile', labels=None)

Discretize continuous variables into bins.

column (str | list) Column name(s) to bin.

bins (int) Number of bins to create.

strategy (str) 'quantile' (equal freq) or 'uniform' (equal width).

Example

auditor.bin_numeric("age", bins=4, strategy='quantile')

.extract_date_features(column, features=['year', 'month', 'day'])

Extract features from datetime columns.

column (str | list) The datetime column(s).

features (list) List of features to extract: 'year', 'month', 'day', 'weekday', 'hour'.

Example

auditor.extract_date_features("joined_at", features=['year', 'month'])

.balance_classes(target, strategy='oversample', random_state=42)

Balance class distribution in the target variable.

target (str) The target class column.

strategy (str) 'oversample', 'undersample', or 'smote' (requires imblearn).

Example

auditor.balance_classes("churn", strategy='smote')

.filter_rows / .drop_columns

Basic dataframe manipulations.

# Filter: column, operator, value
auditor.filter_rows("age", ">=", 18)

# Drop: columns (list)
auditor.drop_columns(["id", "tmp"])

.track_pandas(method_name, *args, **kwargs)

Track any arbitrary pandas method.

auditor.track_pandas("dropna", subset=['col1'])

.export_audit_trail(filename=None, visualize=True)

Save the audit log to a JSON file and optionally generate an HTML visualization.

filename (str) Output filename. Defaults to [name]_audit.json.

visualize (bool) If True (default), generates a corresponding .html file in visualizations/.