PBQ

class PBQ(query: pbq.query.Query, project=None)[source]

bigquery driver using the google official API

query : str
the query
query_obj : Query
pbq.Query object
client : Client
the client object for bigquery
bqstorage_client : BigQueryStorageClient
the google storage client object
to_dataframe(save_query=False, **params)
return the query results as data frame
to_csv(filename, sep=’,’, save_query=False, **params)
save the query results to a csv file
save_to_table(table, dataset, project=None, replace=True, partition=None)
save query to table
run_query()
simply execute your query
table_details(table, dataset, project)
get the information about the table
save_file_to_table(filename, table, dataset, project, file_format=bigquery.SourceFormat.CSV, max_bad_records=0,
replace=True, partition=None)

save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET

save_dataframe_to_table(df: pd.DataFrame, table, dataset, project, max_bad_records=0, replace=True,
partition=None)

same as save file just with pandas dataframe

table_exists(client: bigquery.Client, table_ref: bigquery.table.TableReference)
check if table exists - if True - table exists else not exists

getting query to dataframe

>>> from pbq import Query, PBQ
>>> query = Query("select * from table")
>>> print("the query price:", query.price)
>>> if not query.validate():
>>>     raise RuntimeError("table not valid")
>>> pbq = PBQ(query)
>>> pbq.to_dataframe()

saving query to csv

>>> from pbq import Query, PBQ
>>> query = Query("select * from table")
>>> pbq = PBQ(query)
>>> pbq.to_csv()

saving dataframe to table

>>> import pandas as pd
>>> from pbq import Query, PBQ
>>> df = pd.DataFrame()
>>> PBQ.save_dataframe_to_table(df, 'table', 'dataset', 'project_id', partition='20191013', replace=False)
run_query()[source]

execute your query

static save_dataframe_to_table(df: pandas.core.frame.DataFrame, table, dataset, project, max_bad_records=0, replace=True, partition=None, validate_params=False)[source]

save pd.DataFrame object to table

Parameters:
  • df – pd.DataFrame the dataframe you want to save
  • table – str table name
  • dataset – str data set name
  • project – str project name
  • max_bad_records – int number of bad records allowed in file (default: 0)
  • replace – boolean if set as true - it will replace the table, else append to table (default: True)
  • partition – str partition format DDMMYYY (default: None)
  • validate_params – boolean validate the schema of the table to the dataframe object (default: False)
static save_file_to_table(filename, table, dataset, project, file_format='CSV', max_bad_records=0, replace=True, partition=None)[source]

save file to table, it can be partitioned and it can append to existing table. the supported formats are CSV or PARQUET

Parameters:
  • filename – str with the path to save the file
  • table – str table name
  • dataset – str data set name
  • project – str project name
  • file_format – str possible file format (CSV, PARQUET) (default: CSV)
  • max_bad_records – int number of bad records allowed in file (default: 0)
  • replace – boolean if set as trueit will replace the table, else append to table (default: True)
  • partition – str partition format DDMMYYY (default: None)
save_to_table(table, dataset, project=None, replace=True, partition=None)[source]

save query to table

Parameters:
  • table – str table name
  • dataset – str data set name
  • project – str project name
  • replace – boolean if set as true - it will replace the table, else append to table (default: True)
  • partition – str partition format DDMMYYY (default: None)
static table_details(table, dataset, project)[source]

return a dict object with some details about the table

Parameters:
  • table – str table name
  • dataset – str data set name
  • project – str project name
Returns:

dict with some table information like, last_modified_time, num_bytes, num_rows, and creation_time

static table_exists(client: google.cloud.bigquery.client.Client, table_ref: google.cloud.bigquery.table.TableReference)[source]

check if table exists - if True - table exists else not exists

Parameters:
  • client – bigquery.Client object
  • table_ref – bigquery.table.TableReference object with the table name and dataset
Returns:

boolean True if table exists False if table not exists

to_csv(filename, sep=', ', save_query=False, **params)[source]

save the query results to a csv file

in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project

Parameters:
  • filename – str with the path to save the file
  • sep – str separator to the csv file
  • save_query – boolean if to save the query to a table also
  • params – dict when save_query flag is on you need to give the relevant params
to_dataframe(save_query=False, **params)[source]

return the query results as data frame

in order to save the query to a table as well as getting the dataframe, send a dict as params with: - table - dataset it will save to the same project

Parameters:
  • save_query – boolean if to save the query to a table also
  • params – dict when save_query flag is on you need to give the relevant params
Returns:

pd.DataFrame the query results