Golang的strings.Split()踩坑记录_Golang

背景

工作中，当我们需要对字符串按照某个字符串切分成字符串数组数时，常用到strings.Split()

最近在使用过程中踩到了个坑，后对踩坑原因做了分析，并总结了使用string.Split可能踩到的坑。最后写本篇文章做复盘总结与分享

场景

当时是需要取某个结构体的某个属性，并将其按,切分整体逻辑类似这样的

				?

									type Info struct{

									   Ids string // Ids: 123,456

									}

									func test3(info Info){

									   ids := info.Ids

									   idList := strings.Split(ids , ",")

									   if len(idList) < 1 {

									      return

									   }

									   log.Println("ids-not-empty")

									   // ***

									}

Golang的strings.Split()踩坑记录

当ids = "" 时，控制台打印了 ids-not-empty ,当时百思不得其解，按理来说应该直接走return 这个问题激发了我的好奇心，决定认真排查一下

前置

在排查之前，先大概讲讲 Go 中string的基本结构

golang的string它的运行时的数据结构位于reflect.StringHeader

				?

									type stringHeader struct {

									   Data unsafe.Pointer

									   Len  int

									}

其中Data指向数据数组的指针 ,Len为数组的长度

排查

验证

既然代码中的 if 判断为false，那么就实际打印一下 isList的长度看看呢

				?

									func test3(info Info){  

									    ids := info.Ids

									    idList := strings.Split(ids, ",")

									    log.Printf("idList长度: [%d], idList: [%v]", len(idList), idList)

									    for index, _ := range idList {

									       log.Printf("idList[%d]:[%v]", index, idList[index])

									    }    

									   // ***

									}

Golang的strings.Split()踩坑记录

打印底层信息

好奇心加深，打印一下ids和idList的信息

				?

									const (

									  basePrintInfoV3 = "%s 字符串的指针地址:[%v]，字符串buf数组地址:[%v] ,Len字段的地址:[%p] ,Len字段值:[%v]"

									  basePrintInfoV2 = "%s切片的指针地址:[%p]，切片数组地址:[%p], Len字段的地址:[%p], Len字段的值:[%v]"

									)

									func test3(info Info) {

									  ids := info.Ids

									  idList := strings.Split(ids, ",")

									  getStringPtr("ids ", &ids)

									  getStringSliceAllPtr("idList ", &idList)

									  // ***

									}

									func getStringPtr(name string, str *string) {

									   s2 := (*reflect.StringHeader)(unsafe.Pointer(str))

									   log.Printf(basePrintInfoV3, name, unsafe.Pointer(str), unsafe.Pointer(s2.Data), unsafe.Pointer(&s2.Len), s2.Len)

									}

									func getStringSliceAllPtr(name string, s1 *[]string) {

									   s2 := (*reflect.StringHeader)(unsafe.Pointer(s1))

									   log.Printf(basePrintInfoV2, name, unsafe.Pointer(&s1), unsafe.Pointer(s2.Data), unsafe.Pointer(&s2.Len), s2.Len)

									}

Golang的strings.Split()踩坑记录

追源码

ids 经过 split 之后的数组和预期的不一样，看来应该是 split 源码有特殊处理了，那追一下源码吧

				?

									func Split(s, sep string) []string { return genSplit(s, sep, 0, -1) }

大概读一遍源码能够理清楚genSplit思路

预先确定s 能够被切分成n份
创建长度为n的数组
遍历 s ,将每片数据放入数组中
返回

				?

									func genSplit(s, sep string, sepSave, n int) []string {

									   if n == 0 {

									      return nil

									   }

									   if sep == "" {

									      return explode(s, n)

									   }

									   if n < 0 {

									      // 计算 s 按照 seq 能被切成多少份

									      n = Count(s, sep) + 1

									   }

									   a := make([]string, n)

									   n--

									   i := 0

									   for i < n {

									      // 定位 s里的第一个 sep 所在的位置

									      m := Index(s, sep)

									      if m < 0 {

									         break

									      }

									      // 放入返回的数组

									      a[i] = s[:m+sepSave]

									      // 切割s

									      s = s[m+len(sep):]

									      i++

									   }

									   a[i] = s

									   return a[:i+1]

									}

那么问题应该出就出在 Count 函数中

跟进看看 count 函数会计算 s 字符串中包含了多少个 subStr

				?

									func Count(s, substr string) int {

									   // special case

									   if len(substr) == 0 {

									      return utf8.RuneCountInString(s) + 1

									   }

									   if len(substr) == 1 {

									      return bytealg.CountString(s, substr[0])

									   }

									   n := 0

									   for {

									      i := Index(s, substr)

									      if i == -1 {

									         return n

									      }

									      n++

									      s = s[i+len(substr):]

									   }

									}

Count 中会走 len(substr) == 1这个逻辑，其中的CountString计算s中存在多少个 substr[0]，当时跟进，返回的结果是0 ，这里符合预期。

再结合 genSplit 中的 n = Count() + 1 我们可以发现，在genSplit时，预先创建的数组长度就为0 + 1 = 1 ! 问题迎刃而解

类似情况

经过查阅，这里再总结一下其他使用strings.Split可能遇到的坑

				?

									s := strings.Split("", "")

									fmt.Println(s, len(s)) // [] 0 //返回空数组

									s = strings.Split("abc,abc", "")

									fmt.Println(s, len(s)) // [a b c , a b c] 7 //返回7个数组元素

									s = strings.Split("", ",")

									fmt.Println(s, len(s)) // [] 1 

									s = strings.Split("abc,abc", ",")

									fmt.Println(s, len(s)) // [abc abc] 2

									s = strings.Split("abc,abc", "|")

									fmt.Println(s, len(s)) // [abc,abc] 1

									fmt.Println(len("")) // 0

									fmt.Println(len([]string{""})) // 1 

									str := ""

									fmt.Println(str[0]) // panic