完善的SQL计算能力
$select* fromd:/Orders.csv whereAmount>=100like:
$select* fromd:/Orders.csv whereClientlike'%bro%'空值判断:
$select* fromd:/Orders.csv whereClientisnull与、或、非这样的逻辑运算符可以把比较运算组合起来,实现组合过滤:
$select* fromd:/Orders.csv wherenotAmount>=100andClientlike'bro'orOrderDate isnull$select* fromd:/Orders.csv whereClientin('TAS','KBRO','PNS')$select* fromd:/Orders.csv where(OrderDate<date('2020-01-01') andAmount<=100)or(OrderDate>=date('2020-12-31') andAmount>100)2. 计算列
$selectround(Amount,2), price*quantity fromd:/Orders.csv$selectleft(Client,4) fromd:/Orders.csv$selectyear(OrderDate) fromd:/Orders.csv
$selectcaseyear(OrderDate) when2021then'this year'when2020then'last year'else'previous years'endfromd:/Orders.csv$selectcoalesce(Client,'unknown') fromd:/Orders.csv$selectOrderId, Amount, OrderDate fromd:/Orders.csv$select* fromd:/Orders.csv orderbyClient, Amount desc$selectdistinctClient,Sellerid fromd:/Orders.csv$selectyear(OrderDate),Client,sum(Amount),count(1) fromd:/Orders.csvgroupbyyear(OrderDate),Clienthavingsum(Amount)<=100
$selectavg(Amount) fromd:/Orders.csv$selecto.OrderId,o.Client,e.Name e.Dept,e.EId fromd:/Orders.txt oleftjoind:/Employees.txt e ono.SellerId=e.Eid
右关联:
$selecto.OrderId,o.Client,e.Name e.Dept,e.EId fromd:/Employees.txt erightjoind:/Orders.txt o ono.SellerId=e.Eid
全关联:
$selecto.OrderId,o.Client,e.Name e.Dept,e.EId fromd:/Employees.txt efulljoind:/Orders.txt o ono.SellerId=e.EId
内关联:
$selecto.OrderId,o.Client,e.Name e.Dept fromd:/Orders.csv oinnerjoind:/Employees.csv e ono.SellerId=e.Eid
内关联还可以写成WHERE形式:
$selecto.OrderId,o.Client,e.Name e.Dept fromd:/Orders.csv o ,d:/Employees.csv ewhereo.SellerId=e.Eid
$selectt.Client, t.s, ct.Name, ct.address from(selectClient,sum(amount) s fromd:/Orders.csv groupbyClient) tleftjoinClientTable ct ont.Client=ct.Client
with:
$witht as(selectClient,sum(amount) s fromd:/Orders.csv groupbyClient)selectt.Client, t.s, ct.Name, ct.address fromtleftjoinClientTable ct ont.Client=ct.Client
$select* fromd:/Orders.txt o whereo.sellerid in(selecteid fromd:/Employees.txt)$selectprice*quantity assubtotal fromd:/detail.csvSelect* fromOrders1.csvUnionallSelect* fromOrders2.csv
11. into输出结果
$selectdept,count(1) c,sum(salary) s intodeptResult.xlsx fromemployee.txt groupbydept havings>100000丰富的数据源支持
$select* fromd:/Orders.txt whereAmount>=100andClientlike'bro'orOrderDate isnull如果分隔符不是逗号和tab,就要用SPL扩展函数处理了,比如分隔符是冒号:
$select* from{file("d:/Orders.txt").import@t (;":")}whereAmount>=100and Client like 'bro'or OrderDate isnull
没有标题行的文件,可以用序号表示列名:
$select* from{file("d:/Orders.txt").import()} where_4>=100and_2 like'bro'or_5 isnull某些特殊格式的字符串也要用扩展函数解析,比如日期格式不是标准的yyyy-MM-dd:
$selectyear(OrderDate),sum(Amount) from{file("d:/Orders.txt").import@t (orderid,client,sellerid,amount,orderdate:date:"dd-MM-yyyy")}groupbyyear(OrderDate)
Excel文件上也可以执行SQL,对于格式规范的Excel,只需直接引用文件名:
$select* fromd:/Orders.xlsx whereAmount>=100andClientlike'bro'orOrderDate isnull可以读取指定sheet:
$select* from{file("D:/Orders.xlsx").xlsimport@t (;"sheet3")}whereAmount>=100and Client like 'bro'or OrderDate isnull
从远程网站下载来的csv/xls文件:
$select * from {httpfile("http://127.0.0.1:6868/Orders.csv).import@tc() }where Amount>=100 and Client like 'bro' or OrderDate is null
HTTP协议的特性很多,比如字符集、端口号、post参数、header参数、登录认证等等,SPL扩展函数都可以支持。扩展函数还可以抓取网页中的表格数据,也支持从FTP服务器下载文件,这里不再赘述。
$select* from{json(file("d:\\data.json").read())}whereAmount>=100andClientlike'bro'orOrderDate isnull
二维Json比较少,多层才是常态,SPL扩展函数可以把多层数据转为二维记录,再用SQL计算,这里就不展开了。
$select* from{json(httpfile("http://127.0.0.1:6868/api/getData").read())}whereAmount>=100andClientlike'bro'orOrderDate isnull
扩展函数如果比较多比较长,可以写成分步形式:
| A | |
| 1 | =httpfile("http://127.0.0.1:6868/api/getData") |
| 2 | =A1.read() |
| 3 | =json(A2) |
| 4 | $select * from {A3} where Amount>=100 and Client like 'bro' or OrderDate is null |
类似csv/xls,SPL也可以读取HTTP网站上的json/xml文件。
$select* from{xml(file("d:/data.xml").read(),"xml/row")}whereAmount>=100andClientlike'bro'orOrderDate isnull
Web Service
$select*from{ws_call(ws_client("http://。/entityWS.asmx?wsdl"),"entityWS ":" entityWSSoap":"getData")}whereAmount>=100andClientlike'bro'orOrderDate isnull
NoSQL也不在话下。
$select* from{mongo_shell@x (mongo_open("mongodb://127.0.0.1:27017/mongo"),"main.find()")}whereAmount>=100andClientlike'bro'orOrderDate isnull
MongoDB经常是多层数据,包括前面的restful、web Service,用SPL扩展函数都可以转为二维。
$select* from{sf_query(sf_open(),"/services/data/v51.0/query","Select Id,CaseNumber,Subject From Case where Status='New'")} whereAmount>=100andClientlike'bro'orOrderDate isnullHadoop HDFS上的csv/xls/json/xml:
| A | |
| 1 | =hdfs_open(;"hdfs://192.168.0.8:9000") |
| 2 | =hdfs_file(A1,"/user/Orders.csv":"GBK") |
| 3 | =A2.import@t() |
| 4 | =hdfs_close(A1) |
| 5 | $select Client,sum(Amount) from {A3} group by Client |
HBase也支持:
| A | |
| 1 | =hbase_open("hdfs://192.168.0.8", "192.168.0.8") |
| 2 | =hbase_scan(A1,"Orders") |
| 3 | =hbase_close(A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
Hbase还有filter、cmp等取数方式,SPL都可以支持。
| A | |
| 1 | =hive_client("hdfs://192.168.0.8:9000","thrift://192.168.0.8:9083","hive","asus") |
| 2 | =hive_query(A1, "select * from table") |
| 3 | =hive_close() |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
Spark
| A | |
| 1 | =spark_client("hdfs://192.168.0.8:9000","thrift://192.168.0.8:9083","aa") |
| 2 | =spark_query(A1,"select * from tablename") |
| 3 | =spark_close(A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
阿里云
| A | |
| 1 | =ali_open("http://test.ots.aliyuncs.com","LTAIXZNG5zzSPHTQ","sa","test") |
| 2 | =ali_query@x(A1,"test",["id1","id2"],[1,"10001"]:[10,"70001"], ["id1","id2","f1","f2"],f1>=2000.0) |
| 3 | $select Client,sum(Amount) from {A2} group by Client |
Cassandra
| A | |
| 1 | =stax_open("127.0.0.1":9042,"mycasdb","cassandra":"cassandra") |
| 2 | =stax_query(A1,"select * from user where id=?",1) |
| 3 | =stax_close(A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
ElasticSearch
| A | |
| 1 | =es_open("localhost:9200","user":"un1234") |
| 2 | =es_get(A1,"/person/_mget","{\"ids\":[\"1\",\"2\",\"5\"]}") |
| 3 | =es_close(A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
Redis
| A | |
| 1 | =redis_open() |
| 2 | =redis_hscan(A1, "runoobkey", "v*", 3) |
| 3 | =redis_close (A1) |
| 4 | $select key,value from {A2} where value>=2000 and value<3000 |
SAP BW
| A | |
| 1 | =sap_open("userName","passWord","192.168.0.188","00","000",”E") |
| 2 | =sap_cursor(A1, "Z_TEST1","IT_ROOM").fetch() |
| 3 | =sap_close(A1) |
| 4 | $select * from {A2} where Vendor like '%software%' |
InfluxDB
| A | |
| 1 | =influx_open("http://127.0.0.1:8086", "mydb", "autogen", "admin", "admin") |
| 2 | =influx_query(A1, "SELECT * FROM Orders") |
| 3 | =influx_close(A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
| A | |
| 1 | =kafka_open("D://kafka.properties";"topic-test") |
| 2 | =kafka_poll(A1) |
| 3 | =kafka_close (A1) |
| 4 | $select Client,sum(Amount) from {A2} group by Client |
| A | |
| 1 | =olap_open("http://192.168.0.178:8088/msmdpump.dll","CubeTest","Administrator","admin") |
| 2 | =olap_query(A1,"with member [Measures].[AnnualInterestRate] as'[Measures].[SalesAmount]/[Measures].[StandardCost]-1'select {[Measures].[SalesAmount],[Measures].[StandardCost],[Measures].[ AnnualInterestRate]} on columns, {[Order Date].[Calendar Year].[Calendar Year]} on rows from [DataSourceMulti]") |
| 3 | =olap_close(A1) |
| 4 | $select * from {A2} where SalesAmount>10000 |
SPL除了支持种类繁多的数据源,也可以进行数据源之间的混合计算。比如csv和RDB之间:
$selecto.OrderId,o.Client,e.Name e.Dept fromd:/Orders.csv o innerjoind:/Employees.xls e ono.SellerId=e.Eid| A | B | |
| 1 | =mongo_open("mongodb://127.0.0.1:27017/mongo") | |
| 2 | =mongo_shell@x(A1,"detail.find()").fetch() | =connect("orcl").query@x("select * from main") |
| 3 | $select d.title, m.path,sum(d.amount) from {A2} as d left join {B2} as m on d.cat=m.cat group by d.title, m.path | |
任意数据源之间都可以进行混合计算,而且SQL语法不受数据源的影响。
更深入的计算能力
selectmax(continuousDays)-1from(selectcount(*) continuousDaysfrom(selectsum(changeSign) over(orderbytradeDate) unRiseDaysfrom(selecttradeDate,casewhenprice>lag(price) over(orderbytradeDate) then0else1endchangeSignfromAAPL) )groupbyunRiseDays)
而SPL只需两行:
| A | B | |
| 1 | =T("d:/AAPL.xlsx") | 读Excel文件,首行为列名 |
| 2 | =a=0,A1.max(a=if(price>price[-1],a+1,0)) | 求最长连续上涨天数 |
对于简单的运算,使用基本的SQL是很方便的,但运算需求变复杂时,SQL就不适用了,即使提供更多的功能(比如窗口函数)也不能简化计算。这种情况下,我们推荐用户直接使用代码简洁的SPL,而不必再写多层嵌套的复杂SQL了。基于这个原因,SPL中的SQL也只支持到SQL92标准,没有提供包括窗口函数在内的更多语法。
| A | B | |
| 1 | = T("D:/data/sales.csv").sort(amount:-1) | 取数并逆序排序 |
| 2 | =A1.cumulate(amount) | 计算累计序列 |
| 3 | =A2.m(-1)/2 | 最后的累计值即是总和 |
| 4 | =A2.pselect(~>=A3) | 超过一半的位置 |
| 5 | =A1(to(A4)) | 按位置取值 |
灵活的应用结构

D:\raqsoft64\esProc\bin>esprocx.exe -R select Client,sum(Amount) from d:/Orders.csv group by ClientLoglevel:INFOARO899.0BDR4278.8BON2564.4BSF14394.0CHO1174.0CHOP1420.0DYD1242.0…
对于应用程序中的计算,SPL提供了标准的JDBC驱动,可以方便地集成进JAVA:
…Class.forName("com.esproc.jdbc.InternalDriver");Connection conn =DriverManager.getConnection("jdbc:esproc:local://");PrepareStatement st = conn.prepareStatement("$select* from employee.txt where SALARY >=? and SALARY<?");st.setObject(1, 3000);st.setObject(2, 5000);ResultSet result=st.execute();...
应用中会有些频繁修改或较为复杂的计算,SPL允许将代码外置于JAVA程序,可显著降低代码耦合性。比如上面的SPL代码可以先存为脚本文件,再在JAVA中以存储过程的形式调用:
…Class.forName("com.esproc.jdbc.InternalDriver");Connectionconn =DriverManager.getConnection("jdbc:esproc:local://");Statementst = connection.();CallableStatementst = conn.prepareCall("{call getQuery(?, ?)}");st.setObject(1,3000);st.setObject(2,5000);ResultSetresult=st.execute();...
有了SPL这样的开源神器,没有RDB也能轻松使用SQL了。

重磅!开源SPL交流群成立了
简单好用的SPL开源啦!
为了给感兴趣的小伙伴们提供一个相互交流的平台,
特地开通了交流群(群完全免费,不广告不卖课)
需要进群的朋友,可长按扫描下方二维码

本文感兴趣的朋友,请转到阅读原文去收藏 ^_^