jade.basic.pandas package

jade.basic.pandas.PandasDataFrame module

class jade.basic.pandas.PandasDataFrame.GeneralPandasDataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source]

Bases: pandas.core.frame.DataFrame

detect_numeric()[source]
drop_duplicate_columns()[source]

Drop Duplicate columns from the DataFrame in place :return:

get_columns(columns)[source]
get_matches(column, to_match)[source]

Get all the rows that match a paricular element of a column. :param column: str :param to_match: str :rtype: pandas.DataFrame

get_row_matches(column1, to_match, column2)[source]

Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param column1: str :param to_match: str :param column2: str :rtype: pandas.Series

n_matches(column, to_match)[source]

Return the number of matches. :param column: str :param to_match: str :rtype: int

to_tsv(path_or_buf=None, na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=False, date_format=None, doublequote=True, escapechar=None, decimal='.')[source]
jade.basic.pandas.PandasDataFrame.detect_numeric(df)[source]

Detect numeric components

Parameters:df – pd.DataFrame
Return type:pd.DataFrame
jade.basic.pandas.PandasDataFrame.drop_duplicate_columns(df)[source]

Drop Duplicate columns from the DataFrame. Return DF

Parameters:df – pandas.DataFrame
Return type:pandas.DataFrame
jade.basic.pandas.PandasDataFrame.get_columns(df, columns)[source]

Get a new dataframe of only the columns

Parameters:
  • df – pandas.DataFrame
  • columns – list
Return type:

pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_match_by_array(df, column, match_array)[source]

Get a new dataframe of all dataframes of the subset series, match_array

Note: This will result in a dataframe, but there may be strange issues when you go to plot the data in seaborn
No idea why.
Parameters:
  • df – pd.DataFrame
  • column – str
  • match_array – pd.Series
Return type:

pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_matches(df, column, to_match)[source]

Get all the rows that match a paricular element of a column.

Parameters:
  • df – pandas.DataFrame
  • column – str
  • to_match – str
Return type:

pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_multiple_matches(df, column, to_match_array)[source]

Get all the rows that match any of the values in to_match_array.

Parameters:
  • df – pandas.DataFrame
  • column – str
  • to_match_array – list
Return type:

pd.DataFrame

jade.basic.pandas.PandasDataFrame.get_n_matches(df, column, to_match)[source]

Get the number of matches :param df: pd.DataFrame :param column: str :param to_match: :rtype: int

jade.basic.pandas.PandasDataFrame.get_row_matches(df, column1, to_match, column2)[source]

Get the elements of the rows that match a particular column. If one element, this can be converted easily enough :param df: pd.DataFrame :param column1: str :param to_match: str :param column2: str :rtype: pd.Series

jade.basic.pandas.PandasDataFrame.get_value(df, column)[source]

Get a single value from a one-row df. THis is to help for implicit docs, since the syntax to Iloc is so fucking strange.

Parameters:
  • df – pd.DataFrame
  • column – str
Returns:

value

jade.basic.pandas.PandasDataFrame.multi_tab_excel(df_list, sheet_list, file_name)[source]

Writes multiple dataframes as separate sheets in an output excel file.

If directory of output does not exist, it will create it.

Author: Tom Dobbs http://stackoverflow.com/questions/32957441/putting-many-python-pandas-dataframes-to-one-excel-worksheet

Parameters:
  • df_list – [pd.Dataframe]
  • sheet_list – [str]
  • file_name – str
jade.basic.pandas.PandasDataFrame.sort_on_list(df, column, sort_order)[source]

Given a list of values, and a column, create a new dataframe that is sorted like so. No idea why this is so difficult. :param df: :param list_to_sort: :rtype: pd.DataFrame

jade.basic.pandas.stats module

jade.basic.pandas.stats.calculate_stddev(df, x, y, hue=None)[source]

Calcuates standard deviations for a normal distribution (Numerical data) over X and Hue categories.

If hue is given, the hue column will be added, and the overall will be of ‘ALL’

Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:

SD cdr exp y

20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq

Parameters:
  • df – pandas.DataFrame
  • x – str
  • y – str
  • total_column – str
  • hue – str
Return type:

pandas.DataFrame

jade.basic.pandas.stats.calculate_stddev_binomial_distribution(df, x, y, total_column, y_mean_column, hue=None)[source]

Calculates standard deviations for a binomial distribution (like experiment True/False values) over X and Hue categories..

Typically used for bar-plot.

If hue is given the hue column will be added, and the overall will be of ‘ALL’, plus that of Hue

Example DataFrame output (x=’exp’, y= ‘length_recovery_freq’, hue = ‘cdr’:

SD cdr exp y

20 6.739596 H2 ALL length_recovery_freq 21 7.373650 H2 min.remove_antigen-F length_recovery_freq 22 6.400637 ALL min.remove_antigen-T length_recovery_freq

Parameters:
  • df – pandas.DataFrame
  • x – str
  • y – str
  • total_column – str
  • hue – str
Return type:

pandas.DataFrame