jade.basic.pandas package¶

jade.basic.pandas.PandasDataFrame module¶

class jade.basic.pandas.PandasDataFrame.GeneralPandasDataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]¶

Bases: pandas.core.frame.DataFrame

detect_numeric()[source]¶

drop_duplicate_columns()[source]¶: Drop Duplicate columns from the DataFrame in place :return:

get_columns(columns)[source]¶

get_matches(column, to_match)[source]¶: Get all the rows that match a paricular element of a column. :param column: str :param to_match: str :rtype: pandas.DataFrame

get_row_matches(column1, to_match, column2)[source]¶: Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param column1: str :param to_match: str :param column2: str :rtype: pandas.Series

n_matches(column, to_match)[source]¶: Return the number of matches. :param column: str :param to_match: str :rtype: int

to_tsv(path_or_buf=None, na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, doublequote=True, escapechar=None, decimal='.')[source]¶

jade.basic.pandas.PandasDataFrame.detect_numeric(df)[source]¶

Detect numeric components

Parameters:	df – pd.DataFrame
Return type:	pd.DataFrame

jade.basic.pandas.PandasDataFrame.drop_duplicate_columns(df)[source]¶

Drop Duplicate columns from the DataFrame. Return DF

Parameters:	df – pandas.DataFrame
Return type:	pandas.DataFrame

jade.basic.pandas.PandasDataFrame.get_columns(df, columns)[source]¶

Get a new dataframe of only the columns

Parameters:	df – pandas.DataFrame columns – list
Return type:	pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_match_by_array(df, column, match_array)[source]¶

Get a new dataframe of all dataframes of the subset series, match_array

Note: This will result in a dataframe, but there may be strange issues when you go to plot the data in seaborn: No idea why.

Parameters:	df – pd.DataFrame column – str match_array – pd.Series
Return type:	pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_matches(df, column, to_match)[source]¶

Get all the rows that match a paricular element of a column.

Parameters:	df – pandas.DataFrame column – str to_match – str
Return type:	pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_multiple_matches(df, column, to_match_array)[source]¶

Get all the rows that match any of the values in to_match_array.

Parameters:	df – pandas.DataFrame column – str to_match_array – list
Return type:	pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_n_matches(df, column, to_match)[source]¶: Get the number of matches :param df: pd.DataFrame :param column: str :param to_match: :rtype: int

jade.basic.pandas.PandasDataFrame.get_row_matches(df, column1, to_match, column2)[source]¶: Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param df: pd.DataFrame :param column1: str :param to_match: str :param column2: str :rtype: pd.Series

jade.basic.pandas.PandasDataFrame.get_value(df, column)[source]¶

Get a single value from a one-row df. THis is to help for implicit docs, since the syntax to Iloc is so fucking strange.

Parameters:	df – pd.DataFrame column – str
Returns:	value

jade.basic.pandas.PandasDataFrame.multi_tab_excel(df_list, sheet_list, file_name)[source]¶

Writes multiple dataframes as separate sheets in an output excel file.

If directory of output does not exist, it will create it.

Author: Tom Dobbs http://stackoverflow.com/questions/32957441/putting-many-python-pandas-dataframes-to-one-excel-worksheet

Parameters:	df_list – [pd.Dataframe] sheet_list – [str] file_name – str

jade.basic.pandas.PandasDataFrame.sort_on_list(df, column, sort_order)[source]¶: Given a list of values, and a column, create a new dataframe that is sorted like so. No idea why this is so difficult. :param df: :param list_to_sort: :rtype: pd.DataFrame

jade.basic.pandas.stats module¶

jade.basic.pandas.stats.calculate_stddev(df, x, y, hue=None)[source]¶

Calcuates standard deviations for a normal distribution (Numerical data) over X and Hue categories.

If hue is given, the hue column will be added, and the overall will be of ‘ALL’

Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:

SD cdr exp y

20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq

Parameters:	df – pandas.DataFrame x – str y – str total_column – str hue – str
Return type:	pandas.DataFrame

jade.basic.pandas.stats.calculate_stddev_binomial_distribution(df, x, y, total_column, y_mean_column, hue=None)[source]¶

Calculates standard deviations for a binomial distribution (like experiment True/False values) over X and Hue categories..

Typically used for bar-plot.

If hue is given the hue column will be added, and the overall will be of ‘ALL’, plus that of Hue

Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:

SD cdr exp y

20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq

Parameters:	df – pandas.DataFrame x – str y – str total_column – str hue – str
Return type:	pandas.DataFrame