Skip to content

Conversation

SamYuan1990
Copy link

@SamYuan1990 SamYuan1990 commented Aug 17, 2025

let me leave pr comments away at this moment as POC....
this PR is give a try for #64 as typescript part, I tried with typescript-language-server to set up a typescript-language-server support as part of #64 support, and I started local attempted.

with this PR just show me the code style for what's going on, as you can see many to do and hard code.

issues found:

  • I got stuck at implements FunctionSymbol for typescript Spec, as it's hard to understand the spec interface and there seems zero test case as current spec(python and c), so... what's that?
  • the args support is missing so I hard coded. as typescript-language-server need parameters to start.

What type of PR is this?

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional):

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

Signed-off-by: SamYuan1990 <[email protected]>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@SamYuan1990 SamYuan1990 changed the title give a try with typescript [WIP]give a try with typescript Aug 17, 2025
@Hoblovski
Copy link
Collaborator

Hoblovski commented Aug 18, 2025

Hi thanks for the PR!

  • I got stuck at implements FunctionSymbol for typescript Spec, as it's hard to understand the spec interface and there seems zero test case as current spec(python and c), so... what's that?

It extracts the parameters, output types and receiver for a function.

The thing confusing you might be that FunctionSymbol does it on the semantic tokens of a function, instead of the function ast / verbatim.
Yes, this is not the best approach AFAIK, and we're discussing potential alternatives...

I'll try to add some tests in a few days for Python and C.
But for now:

Consider the bar function

# testdata/python/2_class
class Foo:
    def __init__(self):
        self.x = 5

    def bar(self, v: int) -> int:
        self.x += v
        return self.x

Add some loggings in FunctionSymbol and we have

[INFO]16:37:49 spec.go:286: PythonSpec: FunctionSymbol for bar
[INFO]16:37:49 spec.go:288: token: bar, type: function, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: v, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: int, type: class, modifiers: []
[INFO]16:37:49 spec.go:288: token: int, type: class, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: v, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:381: PythonSpec: FunctionSymbol for bar, receiver: -1, typeParams: [], inputParams: [3], outputParams: [4]

So bar has tokens

bar self v int int self v self
0   1    2 3   4   5    6 7
           ^ input parameter
               ^ output parameter

It has no receiver token (method info is collected in the ImplSymbol function).
No type parameters.

@Hoblovski
Copy link
Collaborator

  • the args support is missing so I hard coded. as typescript-language-server need parameters to start.

That's a valid point! Hardcode is fine until final submission.

You might skip looking for the typescript-language-server because usually it's the user's responsibility to ensure it's found in PATH.

For the lsp args, we have some alternatives

  1. simply add a command line flag like ./abcoder parse typescript ... -lsp_flags "--stdio"
  2. separate language server startup into per-language specifications (the checkLSP and checkRepoPath too)

What do you suggest @AsterDY ? Let's move those functions into per-language specs and just register them in parse.go?

@Hoblovski
Copy link
Collaborator

#64 (comment)

Just saw someone is working on typescript

@SamYuan1990
Copy link
Author

Hi thanks for the PR!

  • I got stuck at implements FunctionSymbol for typescript Spec, as it's hard to understand the spec interface and there seems zero test case as current spec(python and c), so... what's that?

It extracts the parameters, output types and receiver for a function.

The thing confusing you might be that FunctionSymbol does it on the semantic tokens of a function, instead of the function ast / verbatim. Yes, this is not the best approach AFAIK, and we're discussing potential alternatives...

I'll try to add some tests in a few days for Python and C. But for now:

Consider the bar function

# testdata/python/2_class
class Foo:
    def __init__(self):
        self.x = 5

    def bar(self, v: int) -> int:
        self.x += v
        return self.x

Add some loggings in FunctionSymbol and we have

[INFO]16:37:49 spec.go:286: PythonSpec: FunctionSymbol for bar
[INFO]16:37:49 spec.go:288: token: bar, type: function, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: v, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: int, type: class, modifiers: []
[INFO]16:37:49 spec.go:288: token: int, type: class, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: v, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]16:37:49 spec.go:381: PythonSpec: FunctionSymbol for bar, receiver: -1, typeParams: [], inputParams: [3], outputParams: [4]

So bar has tokens

bar self v int int self v self
0   1    2 3   4   5    6 7
	   ^ input parameter
	       ^ output parameter

It has no receiver token (method info is collected in the ImplSymbol function). No type parameters.

thanks @Hoblovski
so could you please more spec as
before:

	// if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found)
	// ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found)
	ImplSymbol(sym DocumentSymbol) (int, int, int)

	// if a symbol is a Function or Method symbol,  return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs
	FunctionSymbol(sym DocumentSymbol) (int, []int, []int, []int)

after

	// if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found)
	// ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found)
	ImplSymbol(sym DocumentSymbol) (int, int, int)

	// if a symbol is a Function or Method symbol,  return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs
	FunctionSymbol(sym DocumentSymbol) (receiver int, typeParams []int, inputParams []int, output_parameter []int)

btw, I am not sure if output_parameter is correct description or not, as from your comments, outputParams in fact is the return type as int.

so which means those are int as location/index from LSP server, may I know how to convert LSP server response into array for int?

@Hoblovski
Copy link
Collaborator

Hoblovski commented Aug 18, 2025

thanks @Hoblovski so could you please more spec as before:

We could use some better examples. Still that might have to wait a few days until I've got the time.

btw, I am not sure if output_parameter is correct description or not, as from your comments, outputParams in fact is the return type as int.

so which means those are int as location/index from LSP server,

Just fixed some indentation error in the original diagram.
Yes, the input parameter and output parameters are indices into the semantic tokens of a symbol.

The functionsymbol output basically says "bar depends on int (input) and int (output)".

may I know how to convert LSP server response into array for int?

During preprocessing (in Collector.Collect), abcoder framework calls textDocument/semanticToken for each symbol, and stores the result ([]Token) into sym.Tokens.
So i basically means the token sym.Tokens[i].
Then the per-language specifications leverages sym.Tokens for functionsymbol and implsymbol.

@SamYuan1990
Copy link
Author

thanks @Hoblovski so could you please more spec as before:

We could use some better examples. Still that might have to wait a few days until I've got the time.

btw, I am not sure if output_parameter is correct description or not, as from your comments, outputParams in fact is the return type as int.
so which means those are int as location/index from LSP server,

Just fixed some indentation error in the original diagram. Yes, the input parameter and output parameters are indices into the semantic tokens of a symbol.

The functionsymbol output basically says "bar depends on int (input) and int (output)".

may I know how to convert LSP server response into array for int?

During preprocessing (in Collector.Collect), abcoder framework calls textDocument/semanticToken for each symbol, and stores the result ([]Token) into sym.Tokens. So i basically means the token sym.Tokens[i]. Then the per-language specifications leverages sym.Tokens for functionsymbol and implsymbol.

please let me know once there a PR with further examples. :-)

@Hoblovski
Copy link
Collaborator

please let me know once there a PR with further examples. :-)

I do not have much time for PR, but I made this:

How FunctionSymbol is used:

LanguageSpec.FunctionSymbol

->
collect.go:functionInfo.{Outputs,Inputs,TypeParams}

->
if info.Outputs != nil {
	for _, output := range info.OutputsSorted {
		tok, _ := c.cli.Locate(output.Location)
		tyid, err := c.exportSymbol(repo, output.Symbol, tok, visited)
		if err != nil {
			continue
		}
		dep := uniast.NewDependency(*tyid, c.fileLine(output.Location))
		obj.Results = uniast.InsertDependency(obj.Results, dep)
	}
}

So it's used to make dependencies.
Specifically, suppose function f has return type T, so there should be a dependency f -> T right?
And currently, assume LSPs highlight T as one/multiple semantic token(s), then we'll extract them into FunctionSymbol.Outputs.

You mentioned integers. They are indices into sym.Tokens, so each int reprensents a semantic Token in the symbol.
So FunctionSymbol actually returns the Tokens in the function sym that are its typeParams/inputs/outputs.

(Disclaimer) Yes, there are better ways to do it that can provide more accurate info.

Another example

code:

class Bar:
    pass
class Foo:
    def met(self) -> tuple[Bar, Foo]:
        return Bar(), self

Foo.met() should (without filtering out any unneeded deps) depend on Bar, Foo, and tuple.
Let's check the logs:

[INFO]15:59:44 spec.go:286: PythonSpec: FunctionSymbol for met
[INFO]15:59:44 spec.go:288: token: met, type: function, modifiers: []
[INFO]15:59:44 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]15:59:44 spec.go:288: token: tuple, type: class, modifiers: []
[INFO]15:59:44 spec.go:288: token: Bar, type: class, modifiers: []
[INFO]15:59:44 spec.go:288: token: Foo, type: class, modifiers: []
[INFO]15:59:44 spec.go:288: token: Bar, type: class, modifiers: []
[INFO]15:59:44 spec.go:288: token: self, type: parameter, modifiers: []
[INFO]15:59:44 spec.go:381: PythonSpec: FunctionSymbol for met, receiver: -1, typeParams: [], inputParams: [], outputParams: [2 3 4]

So met() has the semantic tokens:

	0   1    2     3   4   5   6
	met self tuple Bar Foo Bar self

FunctionSymbol returned outputParams = [2 3 4], saying "the output tokens are 2(tuple), 3(Bar), 4(Foo)".
Matches intuition.

Explanation

So it's all about the **token index **

	// if a symbol is an impl symbol, return the token index of interface type, receiver type and first-method start (-1 means not found)
	// ortherwise the collector will use FunctionSymbol() as receiver type token index (-1 means not found)
	ImplSymbol(sym DocumentSymbol) (int, int, int)

	// if a symbol is a Function or Method symbol,  return the token index of Receiver (-1 means not found),TypeParameters, InputParameters and Outputs
	FunctionSymbol(sym DocumentSymbol) (receiver int, typeParams []int, inputParams []int, output_parameter []int)

@SamYuan1990
Copy link
Author

emmm, I will catch up if I have time this week.

@welkeyever
Copy link
Member

There is another Typescript PR #72 is being reviewed, see if there is any possibility of cooperation?

@SamYuan1990
Copy link
Author

There is another Typescript PR #72 is being reviewed, see if there is any possibility of cooperation?

I am fine with any way for impl.
for this PR, reuse existing tool, but hung up with data process logic for now. I just quick view on #72, it seems a js/ts native way. But seems just js/ts part.

@SamYuan1990
Copy link
Author

There is another Typescript PR #72 is being reviewed, see if there is any possibility of cooperation?

For my opinion, AST parser or language server has different impls, so... data process logic should focus on language specific and make it works for either AST parser, or language server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants