PBQ¶

class PBQ(query: pbq.query.Query, project=None)[source]¶

bigquery driver using the google official API

query : str: the query
query_obj : Query: pbq.Query object
client : Client: the client object for bigquery
bqstorage_client : BigQueryStorageClient: the google storage client object

to_dataframe(save_query=False, **params): return the query results as data frame
to_csv(filename, sep=’,’, save_query=False, **params): save the query results to a csv file
save_to_table(table, dataset, project=None, replace=True, partition=None): save query to table
run_query(): simply execute your query
table_details(table, dataset, project): get the information about the table

save_file_to_table(filename, table, dataset, project, file_format=bigquery.SourceFormat.CSV, max_bad_records=0,: replace=True, partition=None)

save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET
save_dataframe_to_table(df: pd.DataFrame, table, dataset, project, max_bad_records=0, replace=True,: partition=None)

same as save file just with pandas dataframe
table_exists(client: bigquery.Client, table_ref: bigquery.table.TableReference): check if table exists - if True - table exists else not exists

getting query to dataframe

>>> from pbq import Query, PBQ
>>> query = Query("select * from table")

>>> print("the query price:", query.price)

>>> if not query.validate():
>>>     raise RuntimeError("table not valid")

>>> pbq = PBQ(query)
>>> pbq.to_dataframe()

saving query to csv

>>> from pbq import Query, PBQ
>>> query = Query("select * from table")
>>> pbq = PBQ(query)
>>> pbq.to_csv()

saving dataframe to table

>>> import pandas as pd
>>> from pbq import Query, PBQ
>>> df = pd.DataFrame()

>>> PBQ.save_dataframe_to_table(df, 'table', 'dataset', 'project_id', partition='20191013', replace=False)

run_query()[source]¶: execute your query

static save_dataframe_to_table(df: pandas.core.frame.DataFrame, table, dataset, project, max_bad_records=0, replace=True, partition=None, validate_params=False)[source]¶

save pd.DataFrame object to table

Parameters:

df – pd.DataFrame the dataframe you want to save
table – str table name
dataset – str data set name
project – str project name
max_bad_records – int number of bad records allowed in file (default: 0)
replace – boolean if set as true - it will replace the table, else append to table (default: True)
partition – str partition format DDMMYYY (default: None)
validate_params – boolean validate the schema of the table to the dataframe object (default: False)

static save_file_to_table(filename, table, dataset, project, file_format='CSV', max_bad_records=0, replace=True, partition=None)[source]¶

save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET

Parameters:

filename – str with the path to save the file
table – str table name
dataset – str data set name
project – str project name
file_format – str possible file format (CSV, PARQUET) (default: CSV)
max_bad_records – int number of bad records allowed in file (default: 0)
replace – boolean if set as trueit will replace the table, else append to table (default: True)
partition – str partition format DDMMYYY (default: None)

save_to_table(table, dataset, project=None, replace=True, partition=None)[source]¶

save query to table

Parameters:	table – str table name dataset – str data set name project – str project name replace – boolean if set as true - it will replace the table, else append to table (default: True) partition – str partition format DDMMYYY (default: None)

static table_details(table, dataset, project)[source]¶

return a dict object with some details about the table

Parameters:	table – str table name dataset – str data set name project – str project name
Returns:	dict with some table information like, last_modified_time, num_bytes, num_rows, and creation_time

static table_exists(client: google.cloud.bigquery.client.Client, table_ref: google.cloud.bigquery.table.TableReference)[source]¶

check if table exists - if True - table exists else not exists

Parameters:	client – bigquery.Client object table_ref – bigquery.table.TableReference object with the table name and dataset
Returns:	boolean True if table exists False if table not exists

to_csv(filename, sep=', ', save_query=False, **params)[source]¶

save the query results to a csv file

in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project

Parameters:	filename – str with the path to save the file sep – str separator to the csv file save_query – boolean if to save the query to a table also params – dict when save_query flag is on you need to give the relevant params

to_dataframe(save_query=False, **params)[source]¶

return the query results as data frame

in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project

Parameters:	save_query – boolean if to save the query to a table also params – dict when save_query flag is on you need to give the relevant params
Returns:	pd.DataFrame the query results

PBQ¶

PBQ

Navigation

Related Topics