public class GpuColumnBatch extends Object
| Constructor and Description |
|---|
GpuColumnBatch(ai.rapids.cudf.Table table,
org.apache.spark.sql.types.StructType schema) |
| Modifier and Type | Method and Description |
|---|---|
long |
getColumn(int index) |
ai.rapids.cudf.ColumnVector |
getColumnVector(int index) |
ai.rapids.cudf.ColumnVector |
getColumnVectorInitHost(int index) |
int |
getIntInColumn(int dataIndex,
int colIndex,
int defVal) |
int |
getNumColumns() |
long |
getNumRows() |
static ai.rapids.cudf.DType |
getRapidsType(org.apache.spark.sql.types.DataType type) |
org.apache.spark.sql.types.StructType |
getSchema() |
int |
groupAndAggregateOnColumnsHost(int groupIdx,
int weightIdx,
int prevTailGid,
List<Integer> groupInfo,
List<Float> weightInfo)
Group data by column "groupIndex", and do "count" aggregation on column "groupIndex",
while do aggregation similar to "average" on column "weightIdx", but require values in
a group are equal to each other, then merge the results with "groupInfo" and "weightInfo"
separately.
|
public GpuColumnBatch(ai.rapids.cudf.Table table,
org.apache.spark.sql.types.StructType schema)
public org.apache.spark.sql.types.StructType getSchema()
public long getNumRows()
public int getNumColumns()
public ai.rapids.cudf.ColumnVector getColumnVector(int index)
public long getColumn(int index)
public ai.rapids.cudf.ColumnVector getColumnVectorInitHost(int index)
public int getIntInColumn(int dataIndex,
int colIndex,
int defVal)
public int groupAndAggregateOnColumnsHost(int groupIdx,
int weightIdx,
int prevTailGid,
List<Integer> groupInfo,
List<Float> weightInfo)
groupIdx - The index of column to group by.weightIdx - The index of column where to get a value in each group.prevTailGid - The group id of last group in prevGroupInfo.groupInfo - Group information calculated from earlier batches.weightInfo - Weight information calculated from earlier batches.public static ai.rapids.cudf.DType getRapidsType(org.apache.spark.sql.types.DataType type)
Copyright © 2020. All rights reserved.