jade.RAbD_BM package¶

jade.RAbD_BM.AnalysisInfo module¶

class jade.RAbD_BM.AnalysisInfo.AnalysisInfo(json_path)[source]¶

Simple class that parses a json file which defines (USING RELATIVE PATHS):

exp - The name of the experiment - whatever you want it to be.
decoy_dir - the directory of the decoys.
features_db - the db where the features reporters have been run.

The class will store this information, and parse the benchmark info in the decoy dir, storing a BenchmarkInfo object. Benchmark classes and scripts will take lists to these analysis files and use them to generate plots and data.

get_bm_info()[source]¶: Get the benchmark info :rtype: rosetta_bms.BenchmarkInfo

get_decoy_dir()[source]¶

get_exp()[source]¶

get_features_db()[source]¶

class jade.RAbD_BM.AnalysisInfo.NativeInfo(dataset, input_pdb_type, root_dataset_dir='datasets')[source]¶

Simple class to hold native information.

get_decoy_dir()[source]¶

get_features_db()[source]¶

jade.RAbD_BM.AnalyzeRecovery module¶

class jade.RAbD_BM.AnalyzeRecovery.AnalyzeRecovery(pyig_design_db_path, analysis_info, native_info, cdrs=None)[source]¶

Pools Recovery and RR data, outputs to DB

apply(db_path, drop_tables=False)[source]¶

Calculate and Output all the data to the given database.

Parameters:	db_path – str

initialize()[source]¶: Initialize ALL input data before calculating and outputing everything.

class jade.RAbD_BM.AnalyzeRecovery.ObservedRecoveryCalculator(native_db_path)[source]¶

Bases: jade.RAbD_BM.AnalyzeRecovery.RecoveryCalculator

apply(exp_name, pdbids, cdrs, bm_decoy_path, output_dir='data')[source]¶

Calculates the number of times the native clusters and lengths were observed during the experiment, for each PDB. Returns the resulting dataframe.

Return type:	pandas.DataFrame

class jade.RAbD_BM.AnalyzeRecovery.PyIgClassifyDBRepresentationCalculator(native_db_path)[source]¶

Bases: jade.RAbD_BM.AnalyzeRecovery.RecoveryCalculator

apply(exp_name, cdrs, pyig_db_path, lambda_kappa_dict, output_dir='data')[source]¶

Calculates the number of times lengths and clusters are present in the PyIgClassify database. :param lambda_kappa_dict : dict-like [‘lambda’] = [pdbid,]

Return type:	pandas.DataFrame

class jade.RAbD_BM.AnalyzeRecovery.RecoveryCalculator(native_db_path)[source]¶: Bases: object

class jade.RAbD_BM.AnalyzeRecovery.TopRecoveryCalculator(native_db_path)[source]¶

Bases: jade.RAbD_BM.AnalyzeRecovery.RecoveryCalculator

apply(exp_name, pdbids, cdrs, bm_db_path, output_dir='data')[source]¶: Calculate length and cluster recoveries. Store them the same way we used to for the recovery parser. Returns the resulting dataframe of recoveries. :rtype: pandas.DataFrame

jade.RAbD_BM.AnalyzeRecovery.calculate_exp_rr_and_recovery(exp, result_df)[source]¶: Calculate the overall recovery and risk ratio. :param exp: :param result_df: :rtype: pandas.DataFrame

jade.RAbD_BM.AnalyzeRecovery.calculate_per_cdr_rr_and_recovery(exp, cdrs, result_df)[source]¶: Calculate the recovery and risk-ratios PER CDR. :rtype: pandas.DataFrame

jade.RAbD_BM.AnalyzeRecovery.calculate_recovery_and_risk_ratios(top_recovery_df, observed_df)[source]¶

Calculate the Risk Ratio and Recovery Percent for each pdb/cdr given dataframes output by the calculators below.

Return a merged dataframe of the top recovery and observed, with the resulting risk ratio data.

Parameters:	top_recovery_df – pandas.DataFrame observed_df – pandas.DataFrame
Return type:	pandas.DataFrame

jade.RAbD_BM.AnalyzeRecovery.get_decoys(input_dir, pdbid)[source]¶: Use GLOB to Match on pdbid for file names in the input dir. This should skip all the extra PDBs like excn, initial, relax, etc. :param input_dir: str :param tag: str

jade.RAbD_BM.RunBenchmarksRAbD module¶

class jade.RAbD_BM.RunBenchmarksRAbD.RunBenchmarksRAbD[source]¶

Bases: jade.rosetta_jade.RunRosettaBenchmarks.RunRosettaBenchmarks

Benchmark class specifically for RAbD

Details:

ALL INPUT PDBs should go into

project_root/datasets

Typically, you will have multiple directories - native, relaxed, etc.

This is specified as a benchmark using ‘input_pdb_type’ in your json file.

ALL PDBLISTs for benchmarking should go into

project_root/datasets/pdblists

run_benchmark(benchmark_names, benchmark_options)[source]¶

Run a single benchmark with options.

Parameters:	benchmark_names – List of benchmark names benchmark_options – List of benchmark options
Returns:

jade.RAbD_BM.benchmark_plotting module¶

class jade.RAbD_BM.benchmark_plotting.NativeCDRData(datatype, native_path, data_table='cdr_metrics')[source]¶

get_all_data()[source]¶

get_data(pdbid, cdr)[source]¶

setup_data(datatype)[source]¶

class jade.RAbD_BM.benchmark_plotting.PlotData(native_data, rec_data)[source]¶

get_xy_of_exp(exp, rec=True, skip_H3=True)[source]¶

plot_data(outname, rec=True)[source]¶

class jade.RAbD_BM.benchmark_plotting.RecoveryCDRData(db_paths, type='length')[source]¶

setup_data()[source]¶

jade.RAbD_BM.recovery_rr_tools module¶

jade.RAbD_BM.recovery_rr_tools.calculate_geometric_means_rr(df, x, y, hue=None)[source]¶: Example use: rr_data_lengths = calculate_geometric_means_rr(df_all, x=’cdr’, y=’length_rr’, hue=’exp’) rr_data_clusters = calculate_geometric_means_rr(df_all, x=’cdr’, y=’cluster_rr’, hue=’exp’)

jade.RAbD_BM.recovery_rr_tools.calculate_rr_errors(df_all_errors)[source]¶: Calculates the risk ratio errors for cluster and lengths using propagation error equations calculated for the recovery itself. Which is the same for percent as it would be raw data, as the N cancels out in the equations. http://lectureonline.cl.msu.edu/~mmp/labs/error/e2.htm

jade.RAbD_BM.recovery_rr_tools.calculate_set_errorbars_hist(ax, data, x, y, binomial_distro=True, total_column='total_entries', y_freq_column=None, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶

Calculates the standard deviation of the data, sets erorr bars for a histogram. Default base_columnwidth for seaborn plots is .8

Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.

Notes:

If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
Caps are the line horizontal lines in the errorbar.
‘full’ means error bars on both vertical sides of the histogram bar.

Warning:

linewidth of .5 does not show up in all PDFs for all bars.

jade.RAbD_BM.recovery_rr_tools.calculate_set_errorbars_scatter(ax, data, x, y, binomial_distro=False, total_column='total_entries', caps=False, color='k', lw=1.5)[source]¶: (Untested) - Calculates the standard deviation of the data, sets error bars for a typical scatter plot

jade.RAbD_BM.recovery_rr_tools.calculate_stddev_binomial_distribution2(df, x, y, total_column, y_mean_column, hue=None, percent=True)[source]¶: Calcuates stddeviations for a binomial distribution. Returns a dataframe of stddevs If percent=True, we dived by the total to normalize the standard deviation. SD of ‘mean’ = SQRT(n*p*q) where p is probability of success and q is probability of failure.

jade.RAbD_BM.recovery_rr_tools.load_precomputed_recoveries(db_path='data/all_recovery_and_risk_ratio_data.db', table='full_data')[source]¶

Reads recovery data from a database created via script.

rtype: pandas.Dataframe

jade.RAbD_BM.recovery_rr_tools.order_by_row_group(df, column, groups)[source]¶: Order a dataframe by groups. Return the dataframe. Probably a better way to do this already, but I don’t know what it is.

jade.RAbD_BM.recovery_rr_tools.plot_rr(data, x, y, hue=None, ci=None)[source]¶

jade.RAbD_BM.recovery_rr_tools.remove_pdb_and_cdr(df, pdbid, cdr)[source]¶: Removes a particular pdbid and cdr from the db. Returns the new df.

jade.RAbD_BM.recovery_rr_tools.set_errorbars_bar(ax, data, x, y, error_dfs, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶

Sets erorr bars for a bar chart.

Default base_columnwidth for seaborn plots is .8

Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.

Notes:

If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
Caps are the line horizontal lines in the errorbar.
‘full’ means error bars on both vertical sides of the histogram bar.

Warning:

linewidth of .5 does not show up in all PDFs for all bars.

jade.RAbD_BM.recovery_rr_tools.set_errorbars_bar_rr(ax, data, x, y, error_dfs, x_order=None, hue_order=None, hue=None, caps=False, color='k', linewidth=0.75, base_columnwidth=0.8, full=True)[source]¶

Sets erorr bars for a bar chart.

Default base_columnwidth for seaborn plots is .8

Optionally give x_order and/or hue_order for the ordering of the columns. Make sure to pass this while plotting.

Notes:

If Hue is enabled, this base is divided by the number of hue_names for the final width used for plotting.
Caps are the line horizontal lines in the errorbar.
‘full’ means error bars on both vertical sides of the histogram bar.

Warning:

linewidth of .5 does not show up in all PDFs for all bars.

jade.RAbD_BM.tools module¶

jade.RAbD_BM.tools.get_lambda_kappa_pdb_ids(dataset, pdb_type, root_dataset_dir='datasets/pdblists')[source]¶

Get two lists: lambda and kappa pdbids

Parameters:	dataset – str root_dataset_dir – str
Return type:	[str],[str]

jade.RAbD_BM.tools.get_pdb_paths(in_dir, exp_name, match_name=None, use_ensemble=False)[source]¶

jade.RAbD_BM.tools_ab_db module¶

jade.RAbD_BM.tools_ab_db.get_all_clusters_for_length(db, cdr, length, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶: Get all unique clusters for a length and a cdr

jade.RAbD_BM.tools_ab_db.get_all_lengths(db, cdr, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶: Get all unique lengths for a CDR

jade.RAbD_BM.tools_ab_db.get_cdr_data_table_df(db_path)[source]¶: Get a dataframe with typical info from the cdr_data table in the PyIgClassify db. :param db_con: sqlite3.con :rtype: pandas.DataFrame

jade.RAbD_BM.tools_ab_db.get_cdr_rmsd_for_entry(db, pdb, original_chain, cdr, length, fullcluster)[source]¶

jade.RAbD_BM.tools_ab_db.get_center_dih_degrees_for_cluster_and_length(db, cdr, length, cluster)[source]¶: Returns a dictionary of center dihedral angles in positional order. Or returns False if not found. result[“phis’] = [phis as floats] result[“psis”] = [Psis as floats] result[“omegas”] = [Omegas as floats]

jade.RAbD_BM.tools_ab_db.get_center_for_cluster_and_length(db, cdr, length, cluster, data_names_array)[source]¶

jade.RAbD_BM.tools_ab_db.get_cluster_enrichment(df, gene, cdr, cluster)[source]¶: Get the number of matches in the df and pdbid to the cdr and cluster :param df: pandas.DataFrame :rtype: int

jade.RAbD_BM.tools_ab_db.get_cluster_matches(df, gene, cdr, cluster)[source]¶

Get a dataframe of the matching (“Recovered”) rows (DataFrame).

Parameters:	df – pandas.DataFrame
Return type:	pandas.DataFrame:

jade.RAbD_BM.tools_ab_db.get_data_for_cluster_and_length(db, cdr, length, cluster, data_names_array, limit_to_known=True, res_cutoff=2.8, rfac_cutoff=0.3)[source]¶: Get a set of data of a particular length, cdr, and cluster. data_names_array is a list of the types of data. Can include DISTINCT keyword

Example: data_names_array = [“PDB”, “original_chain”, “new_chain”, “sequence”]

jade.RAbD_BM.tools_ab_db.get_dihedral_string_for_centers(db, limit_to_known=True)[source]¶

jade.RAbD_BM.tools_ab_db.get_length_enrichment(df, gene, cdr, length)[source]¶

Get the number of matches in the df and pdbid to the cdr and length

Parameters:	df – pandas.DataFrame length – int
Return type:	int

jade.RAbD_BM.tools_ab_db.get_length_matches(df, gene, cdr, length)[source]¶

Get a dataframe of the matching (“Recovered”) rows (DataFrame).

Parameters:	df – pandas.DataFrame length – int
Return type:	pandas.DataFrame

jade.RAbD_BM.tools_ab_db.get_pdb_chain_subset(db, gene)[source]¶: Return a list of tuples of [pdb, chain] of the particular gene

jade.RAbD_BM.tools_ab_db.get_stem_rmsd_for_entry(db, pdb, original_chain, cdr, length, fullcluster)[source]¶

jade.RAbD_BM.tools_ab_db.get_total_entries(df, gene, cdr)[source]¶: Get a the total number of entries matching the gene and the cdr. Used for recovery. :param df: pandas.DataFrame :rtype: int

jade.RAbD_BM.tools_ab_db.get_unique_sequences_for_cluster(db, cluster, include_outliers, outlier_definition='conservative')[source]¶

jade.RAbD_BM.tools_features_db module¶

jade.RAbD_BM.tools_features_db.get_all_entries(df, pdbid, cdr)[source]¶: Get all entries of a given PDBID and CDR. :param df: pandas.DataFrame :rtype: pandas.DataFrame

jade.RAbD_BM.tools_features_db.get_cdr_cluster_df(db_path)[source]¶: Get a dataframe with typical cluster info in it, which was generated by the features reporter framework. :param db_con: sqlite3.con :rtype: pandas.DataFrame

jade.RAbD_BM.tools_features_db.get_cluster(df, pdbid, cdr)[source]¶

Get the fullcluster from the dataframe for native or experimental data

Parameters:	df – pandas.DataFrame
Return type:	str

jade.RAbD_BM.tools_features_db.get_cluster_matches(df, pdbid, cdr, cluster)[source]¶

Get a dataframe of the matching (“Recovered”) rows (DataFrame).

Parameters:	df – pandas.DataFrame
Return type:	pandas.DataFrame:

jade.RAbD_BM.tools_features_db.get_cluster_recovery(df, pdbid, cdr, cluster)[source]¶: Get the number of matches in the df and pdbid to the cdr and cluster :param df: pandas.DataFrame :rtype: int

jade.RAbD_BM.tools_features_db.get_length(df, pdbid, cdr)[source]¶

Get the length from the dataframe for native or experimental data

Parameters:	df – pandas.DataFrame
Return type:	int

jade.RAbD_BM.tools_features_db.get_length_matches(df, pdbid, cdr, length)[source]¶

Get a dataframe of the matching (“Recovered”) rows (DataFrame).

Parameters:	df – pandas.DataFrame length – int
Return type:	pandas.DataFrame

jade.RAbD_BM.tools_features_db.get_length_recovery(df, pdbid, cdr, length)[source]¶

Get the number of matches in the df and pdbid to the cdr and length

Parameters:	df – pandas.DataFrame length – int
Return type:	int

jade.RAbD_BM.tools_features_db.get_total_entries(df, pdbid, cdr)[source]¶: Get the total number of entries of the particular CDR and PDBID in the database :param df: pandas.DataFrame :rtype: int